Re: [RFC] [PATCH] Cgroup based OOM killer controller

From: David Rientjes
Date: Thu Jan 22 2009 - 17:29:55 EST

On Fri, 23 Jan 2009, Evgeniy Polyakov wrote:

> I showed the case when it does not work at all. And then found (in this
> mail), that task (part) has to be present in the memory, which means it
> will be locked, which in turns will not work with the system which
> already locked its range allowed by the limits.

Yes, a userspace oom handler must be sanely implemented.

> And returning to the oom_adj and cpusets tunables. Why any new process
> started in given cpuset can not be tuned by external application or some
> script to have bigger/smaller oom_adj parameter? :)

oom_adj scores are separate from the hard-coded and very fundamental
heuristic that we should kill a task that has memory allocated on nodes we
are attempting to free. Anything else would just be stupid.

> > How do I prioritize oom killing if my system is running cpusets, then?
> Just the way it works right now :)
> You do not object against patches which improve superh cpu support
> with the argument, that it is not possible to enable that feature,
> when system does not have superh cpu.

No, I object against any patch that isn't a complete solution to the
problem being presented. It's purely a matter of good software
engineering practices and in the interest of a long-term maintainable

> > The userspace handler is a schedulable task resident in memory that, with
> > any sane implementation, would not require additional memory when running.
> And what happens when it can not lock the memory because of the limits?

Any sane handler for responding to oom conditions will not require
additional memory from nodes that are under oom, whether that includes all
system memory or a subset, if it is attached to the oom notifier.

> Hmm, you likely missed the part in the last line. And in the first two,
> where I said that before oom-killer started (and killed some processes,
> usually not those which were need, but its a different story). System
> just did not have a free memory to have _any_ progress neither in atomic
> context, nor in process, so it had to invoke an oom-killer.

The page allocator cannot invoke the oom killer in atomic context, so this
would be happening in process content where it can sleep. The userspace
oom handler will wake up, handle the condition either by relaxing hardwall
restrictions for either the memory controller or cpusets, or killing a
task itself unless it chooses to defer to the kernel.

> In that case userspace just can not reply or even awake. While kernel is
> effectively alive if it does not need to allocate a memory. And could
> kill some process to free up the ram.

Wrong, oom conditions do not preempt task scheduling.

> Userspace notifications are great, no problem, but do not rely on them,
> since there is a huge world outside the case it works in, which will be
> quite unhappy when systems start freezing because oom-killer relied on
> the userspace.

I'm quite certain you've spent more time writing emails to me than merging
the patch and testing its possibilities, given your lack of understanding
of its very basic concepts.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at