Re: [patch 4/7 -mm] oom: badness heuristic rewrite

From: David Rientjes
Date: Thu Feb 11 2010 - 18:31:28 EST


On Thu, 11 Feb 2010, Andrew Morton wrote:

> > > > Sigh, this is going to require the amount of system memory to be
> > > > partitioned into OOM_ADJUST_MAX, 15, chunks and that's going to be the
> > > > granularity at which we'll be able to either bias or discount memory usage
> > > > of individual tasks by: instead of being able to do this with 0.1%
> > > > granularity we'll now be limited to 100 / 15, or ~7%. That's ~9GB on my
> > > > 128GB system just because this was originally a bitshift. The upside is
> > > > that it's now linear and not exponential.
> > >
> > > Can you add newly-named knobs (rather than modifying the existing
> > > ones), deprecate the old ones and then massage writes to the old ones
> > > so that they talk into the new framework?
> > >
> >
> > That's what I was thinking, add /proc/pid/oom_score_adj that is just added
> > into the badness score (and is then exported with /proc/pid/oom_score)
> > like this patch did with oom_adj and then scale it into oom_adj units for
> > that tunable. A write to either oom_adj or oom_score_adj would change the
> > other,
>
> How ugly is all this?
>

The advantages outweigh the disadvantages, users need to be able to
specify how much memory vital tasks should be able to use compared to
others without getting penalized and that needs to be done as a fraction
of available memory. I wanted to avoid it originally by not having to
introduce another tunable, but I understand the need for a stable ABI and
backwards compatability. The way /proc/pid/oom_adj currently works as a
bitshift on the badness score is nearly impossible to tune correctly so
change in scoring is inevitable. Luckily, users who tune either can
ignore the other until such time as oom_adj can be removed.

> There _are_ things we can do though. Detect a write to the old file and
> emit a WARN_ON_ONCE("you suck"). Wait a year, turn it into
> WARN_ON("you really suck"). Wait a year, then remove it.
>

Ok, I'll use WARN_ON_ONCE() to let the user know of the deprecation and
then add an entry to Documentation/feature-removal-schedule.txt.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/