Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use realRSS/swap value for oom_score (Re: Memory overcommit

From: KAMEZAWA Hiroyuki
Date: Tue Oct 27 2009 - 20:31:40 EST


On Tue, 27 Oct 2009 18:39:07 +0000 (GMT)
Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx> wrote:

> On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote:
> > Now, oom-killer's score uses mm->total_vm as its base value.
> > But, in these days, applications like GUI program tend to use
> > much shared libraries and total_vm grows too high even when
> > pages are not fully mapped.
> >
> > For example, running a program "mmap" which allocates 1 GBbytes of
> > anonymous memory, oom_score top 10 on system will be..
> >
> > score PID name
> > 89924 3938 mixer_applet2
> > 90210 3942 tomboy
> > 94753 3936 clock-applet
> > 101994 3919 pulseaudio
> > 113525 4028 gnome-terminal
> > 127340 1 init
> > 128177 3871 nautilus
> > 151003 11515 bash
> > 256944 11653 mmap <-----------------use 1G of anon
> > 425561 3829 gnome-session
> >
> > No one believes gnome-session is more guilty than "mmap".
> >
> > Instead of total_vm, we should use anon/file/swap usage of a process, I think.
> > This patch adds mm->swap_usage and calculate oom_score based on
> > anon_rss + file_rss + swap_usage.
> > Considering usual applications, this will be much better information than
> > total_vm. After this patch, the score on my desktop is
> >
> > score PID name
> > 4033 3176 gnome-panel
> > 4077 3113 xinit
> > 4526 3190 python
> > 4820 3161 gnome-settings-
> > 4989 3289 gnome-terminal
> > 7105 3271 tomboy
> > 8427 3177 nautilus
> > 17549 3140 gnome-session
> > 128501 3299 bash
> > 256106 3383 mmap
> >
> > This order is not bad, I think.
> >
> > Note: This adss new counter...then new cost is added.
>
> I've often thought we ought to supply such a swap_usage statistic;
> and show it in /proc/pid/statsomething, presumably VmSwap in
> /proc/pid/status, even an additional field on the end of statm.
>
Hm, ok. I'll divide this patch into

- replace total_vm with anon_rss + file_rsss (everyone will agree this.)
- add swap usage accounting
- show it via /proc (may need discuss about its style.)
- use the value at oom calculation (need discuss)

> A slight new cost, yes: doesn't matter at the swapping end, but
> would slightly impact fork and exit - I do hope we can afford it,
> because I think it should have been available all along.
>
fork()/exit() uses batched counting. Then, we don't see overhead.


> I've not checked your patch in detail; but I do agree that basing
> OOM (physical memory) decisions on total_vm (virtual memory) has
> seemed weird, so it's well worth trying this approach. Whether swap
> should be included along with rss isn't quite clear to me: I'm not
> saying you're wrong, not at all, just that it's not quite obvious.
>
yes. It just comes from heuristics. It will need discuss/investigation/theory.


> I've several observations to make about bad OOM kill decisions,
> but it's probably better that I make them in the original
> "Memory overcommit" thread, rather than divert this thread.
>

Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/