Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks

From: Michal Hocko
Date: Thu Jan 14 2016 - 06:00:49 EST


On Wed 13-01-16 16:38:26, David Rientjes wrote:
> On Wed, 13 Jan 2016, Michal Hocko wrote:
[...]
> > > I think it would be
> > > better for sysrq+f to first select a process with fatal_signal_pending()
> > > set so it silently gets access to memory reserves and then a second
> > > sysrq+f to choose a different process, if necessary, because of
> > > TIF_MEMDIE.
> >
> > The disadvantage of this approach is that sysrq+f might silently be
> > ignored and the administrator doesn't have any signal about that.

Sorry I meant to say "administrator doesn't know why it has been
ignored". But it would have been better to say "administrator cannot do
anything about that".

> The administrator can check the kernel log for an oom kill.

What should the admin do when the request got ignored, though? sysrq+i?
sysrq+b?

> Killing additional processes is not going to help

Whether it is going to help or not is a different topic. My point is
that we have a sysrq action which might get ignored without providing
any explanation. But what I consider much bigger issue is that the
deliberate request of the admin is ignored in the first place. Me as an
admin do not expect the system knows better than me when I perform some
action.

> and has never been the semantics
> of the sysrq trigger, it is quite clearly defined as killing a process
> when out of memory,

I disagree. Being OOM has never been the requirement for sysrq+f to kill
a task. It should kill _a_ memory hog. Your system might be trashing to
the point you are not able to log in and resolve the situation in a
reasonable time yet you are still not OOM. sysrq+f is your only choice
then.

> not serial killing everything on the machine.

Which is not proposed here. The only thing I would like to achive is to
get rid of OOM killer heuristics which assume some forward progress in
order to prevent from killing a task. Those make perfect sense when the
system tries to resolve the OOM condition but when the administrator has
clearly asked to _kill_a_ memory hog then the result should be killing a
task which consumes memory (ideally the largest one).

What would be a regression scenario for this change? I can clearly see
deficiency of the current implementation so we should weigh cons and
pros here I believe.
--
Michal Hocko
SUSE Labs