RE: [PATCH 3.2.0-rc1 3/3] Used Memory Meter pseudo-device module

From: David Rientjes
Date: Wed Jan 11 2012 - 16:44:47 EST


On Wed, 11 Jan 2012, leonid.moiseichuk@xxxxxxxxx wrote:

> > So if the page allocator can make no progress in freeing memory, we would
> > introduce a delay in out_of_memory() if it were configured via a sysctl from
> > userspace. When this delay is started, applications waiting on this event can
> > be notified with eventfd(2) that the delay has started and they have
> > however many milliseconds to address the situation. When they rewrite the
> > sysctl, the delay is cleared. If they don't rewrite the sysctl and the delay
> > expires, the oom killer proceeds with killing.
> >
> > What's missing for your use case with this proposal?
>
> Timed delays in multi-process handling in case OOM looks for me fragile
> construction due to delays are not predicable.

Not sure what you mean by predictable; the oom conditions themselves
certainly aren't predictable, otherwise you wouldn't need notification at
all. The delays are predictable since you configure it to be a number of
millisecs via a global sysctl. Userspace can either handle the oom itself
and rewrite that sysctl to reset the delay or write 0 to make the kernel
immediately oom. If the delay expires, then it is assumed that userspace
is dead and the kernel will proceed to avoid livelock.

> Memcg supports [1] better approach to freeze whole group and kick
> pointed user-space application to handle it. We planned
> to use it as:
> - enlarge cgroup
> - send SIGTERM to selected "bad" application e.g. based on oom_score
> - wait a bit
> - send SIGKILL to "bad" application
> - reduce group size
>
> But finally default OOM killer starts to work fine.
>

I think you're misunderstanding the proposal; in the case of a global oom
(that means without memcg) then, by definition, all threads that are
allocating memory would be frozen and incur the delay at the point they
would currently call into the oom killer. If your userspace is alive,
i.e. the application responsible for managing oom killing, then it can
wait on eventfd(2), wake up, and then send SIGTERM and SIGKILL to the
appropriate threads based on priority.

So, again, why wouldn't this work for you?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/