Re: [PATCH v3] mm: memcontrol: Don't flood OOM messages with no eligible task.

From: Sergey Senozhatsky
Date: Thu Oct 18 2018 - 02:10:31 EST


On (10/18/18 14:26), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
> > To my personal taste, "baud rate of registered and enabled consoles"
> > approach is drastically more relevant than hard coded 10 * HZ or
> > 60 * HZ magic numbers... But not in the form of that "min baud rate"
> > brain fart, which I have posted.
>
> I'm saying that my 60 * HZ is "duration which the OOM killer keeps refraining
> from calling printk()". Such period is required for allowing console users
> to do their operations without being disturbed by the OOM killer.
>

Got you. I'm probably not paying too much attention to this discussion.
You start your commit message with "RCU stalls" and end with a compleely
different problem "admin interaction". I skipped the last part of the
commit message.

OK. That makes sense if any user intervention/interaction actually happens.
I'm not sure that someone at facebook or google logins to every server
that is under OOM to do something critically important there. Net console
logs and postmortem analysis, *perhaps*, would be their choice. I believe
it was Johannes who said that his net console is capable of keeping up
with the traffic and that 60 * HZ is too long for him. So I can see why
people might not be happy with your patch. I don't think that 60 * HZ
enforcement will go anywhere.

Now, if your problem is
"I'm actually logged in, and want to do something
sane, how do I stop this OOM report flood because
it wipes out everything I have on my console?"

then let's formulate it as
"I'm actually logged in, and want to do something
sane, how do I stop this OOM report flood because
it wipes out everything I have on my console?"

and let's hear from MM people what they can suggest.

Michal, Andrew, Johannes, any thoughts?

For instance,
change /proc/sys/kernel/printk and suppress most of the warnings?

// not only OOM but possibly other printk()-s that can come from
// different CPUs

If your problem is "syzbot hits RCU stalls" then let's have a baud rate
based ratelimiting; I think we can get more or less reasonable timeout
values.

-ss