Re: Memory overcommit

From: David Rientjes
Date: Fri Oct 30 2009 - 15:44:30 EST

Next message: Thadeu Lima de Souza Cascardo: "[PATCH] pci: remove pci_find_slot from PCI_LEGACY config description"
Previous message: Thadeu Lima de Souza Cascardo: "[PATCH resend] misc: remove MAC pmu function declaration from misc device class"
In reply to: Hugh Dickins: "Re: Memory overcommit"
Next in thread: KAMEZAWA Hiroyuki: "Re: Memory overcommit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 30 Oct 2009, Vedran Furac wrote:

> Well, you are kernel hacker, not me. You know how linux mm works much
> more than I do. I just reported a, what I think is a big problem, which
> needs to be solved ASAP (2.6.33).

The oom killer heuristics have not been changed recently, why is this
suddenly a problem that needs to be immediately addressed? The heuristics
you've been referring to have been used for at least three years.

> I'm afraid that we'll just talk much
> and nothing will be done with solution/fix postponed indefinitely. Not
> sure if you are interested, but I tested this on windowsxp also, and
> nothing bad happens there, system continues to function properly.
>

I'm totally sympathetic to testcases such as your own where the oom killer
seems to react in an undesirable way. I agree that it could do a much
better job at targeting "test" and killing it without negatively impacting
other tasks.

However, I don't think we can simply change the baseline (like the rss
change which has been added to -mm (??)) and consider it a major
improvement when it severely impacts how system administrators are able to
tune the badness heuristic from userspace via /proc/pid/oom_adj. I'm sure
you'd agree that user input is important in this matter and so that we
should maximize that ability rather than make it more difficult. That's
my main criticism of the suggestions thus far (and, sorry, but I have to
look out for production server interests here: you can't take away our
ability to influence oom badness scoring just because other simple
heuristics may be more understandable).

> > Much better is to allow the user to decide at what point, regardless of
> > swap usage, their application is using much more memory than expected or
> > required. They can do that right now pretty well with /proc/pid/oom_adj
> > without this outlandish claim that they should be expected to know the rss
> > of their applications at the time of oom to effectively tune oom_adj.
>
> Believe me, barely a few developers use oom_adj for their applications,
> and probably almost none of the end users. What should they do, every
> time they start an application, go to console and set the oom_adj. You
> cannot expect them to do that.
>

oom_adj is an extremely important part of our infrastructure and although
the majority of Linux users may not use it (I know a number of opensource
programs that tune its own, however), we can't let go of our ability to
specify an oom killing priority.

There are no simple solutions to this problem: the model proposed thus
far, which has basically been to acknowledge that oom killer is a bad
thing to encounter (but within that, some rationale was found that we can
react however we want??) and should be extremely easy to understand (just
kill the memory hogger with the most resident RAM) is a non-starter.

What would be better, and what I think we'll end up with, is a root
selectable heuristic so that production servers and desktop machines can
use different heuristics to make oom kill selections. We already have
/proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to
address concerns specifically of SGI and their enormously long tasklist
scans. This would be variation on that idea and would include different
simplistic behaviors (such as always killing the most memory hogging task,
killing the most recently started task by the same uid, etc), and leave
the default heuristic much the same as currently.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Thadeu Lima de Souza Cascardo: "[PATCH] pci: remove pci_find_slot from PCI_LEGACY config description"
Previous message: Thadeu Lima de Souza Cascardo: "[PATCH resend] misc: remove MAC pmu function declaration from misc device class"
In reply to: Hugh Dickins: "Re: Memory overcommit"
Next in thread: KAMEZAWA Hiroyuki: "Re: Memory overcommit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]