Re: Linux killed Kenny, bastard!

From: David Rientjes
Date: Tue Jan 13 2009 - 04:54:36 EST


On Tue, 13 Jan 2009, Evgeniy Polyakov wrote:

> It is a theory, not a practice. OOM-killer most of time starts from ssh,
> database and lighttpd on the tested machines, when it could start in
> the reverse order and do not touch ssh at all. Better not from daemon
> itself, but its fastcgi spawned processes.
>

In the unconstrained system-wide oom case, it scans each task on the
system (which can take very long, ask SGI) and rates its badness scoring.
When a memory-hogging task is identified, which you have complete control
over in userspace by tuning /proc/pid/oom_adj, it attempts to kill a child
first if it will allow for memory freeing without killing the parent.

> I agree, that there are ways to tune the way oom-killer selects the
> victim, and likely after hours of games this subtly will work for the
> specified workload.

It doesn't involve "hours of games," it is a very simple heuristic that
you can easily tune to specify your preferences.

What you're looking for with your patch is simply a way to specify an oom
preference before the task has been forked, but that's simple to do with
the current logic since oom_adj scores are inherited and preference is
given to killing a child before parent.

> What I propose is the simplest way for the most
> commonly used case.

No, procfs is the correct interface for tuning oom kill preferences and
not by name parsing.

With oom_adj scores, you have the ability to specify oom kill preferences
within a cpuset or memory controller as well, whereas oom_victim_name is
global and very costly when not found in select_bad_process().

> It is a help for the admin and not the force to
> invent complex machinery which will be error-prone and hard to debug
> when eventually oom happens.

It's very simple to debug the oom killer's decisions, which is why I
introduced /proc/sys/vm/oom_dump_tasks.

It also requires two expensive scans of the entire tasklist (I introduced
/proc/sys/vm/oom_kill_allocating_task specifically to avoid _one_
expensive scan) when oom_victim_name isn't found.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/