Re: [BUGFIX][PATCH] oom-kill: fix NUMA consraint check with nodemaskv4.2

From: David Rientjes
Date: Thu Dec 17 2009 - 17:22:13 EST

On Tue, 15 Dec 2009, KOSAKI Motohiro wrote:

> > A few requirements that I have:
> Um, good analysis! really.
> >
> > - we must be able to define when a task is a memory hogger; this is
> > currently done by /proc/pid/oom_adj relying on the overall total_vm
> > size of the task as a baseline. Most users should have a good sense
> > of when their task is using more memory than expected and killing a
> > memory leaker should always be the optimal oom killer result. A better
> > set of units other than a shift on total_vm would be helpful, though.
> nit: What's mean "Most users"? desktop user(one of most majority users)
> don't have any expection of memory usage.
> but, if admin have memory expection, they should be able to tune
> optimal oom result.
> I think you pointed right thing.

This is mostly referring to production server users where memory
consumption by particular applications can be estimated, which allows the
kernel to determine when a task is using a wildly unexpected amount that
happens to become egregious enough to force the oom killer into killing a

That is contrast to using rss as a baseline where we prefer on killing the
application with the most resident RAM. It is not always ideal to kill a
task with 8GB of rss when we fail to allocate a single page for a low
priority task.

> > - we must prefer tasks that run on a cpuset or mempolicy's nodes if the
> > oom condition is constrained by that cpuset or mempolicy and its not a
> > system-wide issue.
> agreed. (who disagree it?)

It's possible to nullify the current penalization in the badness heuristic
(order 3 reduction) if a candidate task does not share nodes with
current's allowed set either by way of cpusets or mempolicies. For
example, an oom caused by an application with an MPOL_BIND on a single
node can easily kill a task that has no memory resident on that node if
its usage (or rss) is 3 orders higher than any candidate that is allowed
on my bound node.

> > - we must be able to polarize the badness heuristic to always select a
> > particular task is if its very low priority or disable oom killing for
> > a task if its must-run.
> Probably I haven't catch your point. What's mean "polarize"? Can you
> please describe more?

We need to be able to polarize tasks so they are always killed regardless
of any kernel heuristic (/proc/pid/oom_adj of +15, currently) or always
chosen last (-16, currently). We also need a way of completely disabling
oom killing for certain tasks such as with OOM_DISABLE.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at