Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!

From: Evgeniy Polyakov
Date: Tue Jan 13 2009 - 09:24:38 EST


On Tue, Jan 13, 2009 at 02:06:27PM +0000, Alan Cox (alan@xxxxxxxxxxxxxxxxxxx) wrote:
> > Do you _REALLY_ think anyone can calculate it yourself and then properly
> > calculate adjustment used to properly select oom-killed process?
>
> Its always a heuristic.

For the system which knows what it is. User does not and really can not
work with it, since there is no sane way to implement that heuristic in
the applications or even in (theoretically possible) monitor daemon.

So, effectively, oom adjustment does not work.

> > So far my patch is the sanest way to deal with the OOM selection
>
> No. You keep maintaining this but your crude hack is useless in a non
> co-operative environment, has lots of issue with name aliasing and
> doesn't deal with real needs.

It is created because of real needs. Because people need to control the
behaviour of the system and they want to control which application will
be killed to free the memory. Attached patch is not the best solution,
but it works for the all cases I can think about.

Let's take you 'name aliasing' claim: if there are several processes
with the same name, system will select the one with the worst score
according to the own magical algorithm. So it will not kill random
process just because it happend to have ricky name.

And the same applies to the other issues. It just helps system to select
the process to be killed according to userspace expectation of what
should be killed to free the memory.

> We have container interfaces that can do this and far more and do them
> right. In fact the very start of all the OpenVZ and container work years
> ago was the beancounter patches which were addressed at exactly this
> problem (although more specifically 'making sure undergraduates processes
> get killed first')

Are the beancounters used to limit amount of virtual ram and not the
physical one? It really does not work to limit for example some java
machine which will ate all virtual space swapping out different node.
It works for some (and likely the most, I do not argue this) cases and
has overhead. But we are talking not about how to limit the processes,
but what to do when we happend to have out-of-memory condition. And it
happens all the time even if you put the processes into the separate
container, since there are situations (that's why it was started at
first), when you have a huge process which should not be killed and set
of either its children or external processes, which should be checked
and some of them (administrator would like to specify the less
important) should be killed without much harm to the system.

And patch I presented allows to do it. It introduces a hint for the
killer on what processes should be checked first. It works exactly the
way people work with their system: they run different application and
expect some of them to be higher or lower priority when things come to
the oom condition. No one ever proposes to kill exactly the process we
select (although that may be a good idea in some cases), but instead to
show that oom-killer should check given group first. The group
administrator knows to be potentially harmless.

--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/