Re: [PATCH] oom_kill: use rss value instead of vm size for badness

From: David Rientjes
Date: Wed Nov 25 2009 - 16:40:25 EST


On Wed, 25 Nov 2009, Andrea Arcangeli wrote:

> You're focusing on the noise and not looking at the only thing that
> matters.
>
> The noise level with rss went down to 50000, it doesn't matter the
> order of what's below 50000. Only thing it matters is the _delta_
> between "noise-level innocent apps" and "exploit".
>
> The delta is clearly increase from 708945-max(noise) to
> 707878-max(noise) which translates to a increase of precision from
> 513250 to 665677, which shows how much more rss is making the
> detection more accurate (i.e. the distance between exploit and first
> innocent app). The lower level the noise level starts, the less likely
> the innocent apps are killed.
>

That's not surprising since the amount of physical RAM is the constraining
factor.

> There's simply no way to get to perfection, some innocent apps will
> always have high total_vm or rss levels, but this at least removes
> lots of innocent apps from the equation. The fact X isn't less
> innocent than before is because its rss is quite big, and this is not
> an error, luckily much smaller than the hog itself. Surely there are
> ways to force X to load huge bitmaps into its address space too
> (regardless of total_vm or rss) but again no perfection, just better
> with rss even in this testcase.
>

We use the oom killer as a mechanism to enforce memory containment policy,
we are much more interested in the oom killing priority than the oom
killer's own heuristics to determine the ideal task to kill. Those
heuristics can't possibly represent the priorities for all possible
workloads, so we require input from the user via /proc/pid/oom_adj to
adjust that heuristic. That has traditionally always used total_vm as a
baseline which is a much more static value and can be quantified within a
reasonable range by experimental data when it would not be defined as
rogue. By changing the baseline to rss, we lose much of that control
since its more dynamic and dependent on the current state of the machine
at the time of the oom which can be predicted with less accuracy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/