Re: [patch 1/2] mm, memcg: avoid oom notification when currentneeds access to memory reserves

From: Andrew Morton
Date: Thu Jan 09 2014 - 19:12:54 EST


On Thu, 9 Jan 2014 16:01:15 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> On Thu, 9 Jan 2014, Andrew Morton wrote:
>
> > > I'm not sure why this was dropped since it's vitally needed for any sane
> > > userspace oom handler to be effective.
> >
> > It was dropped because the other memcg developers disagreed with it.
> >
>
> It was acked-by Michal.

And Johannes?

> > I'd really prefer not to have to spend a great amount of time parsing
> > argumentative and repetitive emails to make a tie-break decision which
> > may well be wrong anyway.
> >
> > Please work with the other guys to find an acceptable implementation.
> > There must be *something* we can do?
> >
>
> We REQUIRE this behavior for a sane userspace oom handler implementation.
> You've snipped my email quite extensively, but I'd like to know
> specifically how you would implement a userspace oom handler described by
> Section 10 of Documentation/cgroups/memory.txt without this patch?

>From long experience I know that if I suggest an alternative
implementation, advocates of the initial implementation will invest
great effort in demonstrating why my suggestion won't work while
investing zero effort in thinking up alternatives themselves.

> Are you suggesting that userspace is supposed to wait for successive
> wakeups over some arbitrarily defined period of time to determine whether
> memory freeing (i.e. a process in the exit() path or with a pending
> SIGKILL making forward progress to free its memory) can be done or whether
> it needs to do something to free memory? If not, how else is userspace
> supposed to know that it should act?
>
> How do you prevent unnecessary oom killing if the userspace oom handler
> wakes up and kills something concurrent with the process triggering the
> notification getting access to memory reserves, exiting, and freeing its
> memory? Userspace just killed a process unnecessarily. This is the exact
> reason why the kernel oom killer doesn't do a damn thing in these
> conditions, because it's NOT ACTIONABLE by the oom killer, a process
> simply needs to exit.

So the interface is wrong. We have two semantically different kernel
states which are being communicated to userspace in the same way, so
userspace cannot disambiguate.

Solution: invent a better communication scheme with a richer payload.
Use that, deprecate the old interface if poss.

Another solution: add a mode knob to select between alternative kernel
behaviors (yuk).

Another solution: get David to think of a solution which addresses the
issues which others have raised.

Johannes' final email in this thread has yet to be replied to, btw.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/