Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl forprotecting the working set

From: Mandeep Singh Baines
Date: Mon Nov 01 2010 - 14:24:51 EST


KOSAKI Motohiro (kosaki.motohiro@xxxxxxxxxxxxxx) wrote:
> Hi
>
> > On ChromiumOS, we do not use swap. When memory is low, the only way to
> > free memory is to reclaim pages from the file list. This results in a
> > lot of thrashing under low memory conditions. We see the system become
> > unresponsive for minutes before it eventually OOMs. We also see very
> > slow browser tab switching under low memory. Instead of an unresponsive
> > system, we'd really like the kernel to OOM as soon as it starts to
> > thrash. If it can't keep the working set in memory, then OOM.
> > Losing one of many tabs is a better behaviour for the user than an
> > unresponsive system.
> >
> > This patch create a new sysctl, min_filelist_kbytes, which disables reclaim
> > of file-backed pages when when there are less than min_filelist_bytes worth
> > of such pages in the cache. This tunable is handy for low memory systems
> > using solid-state storage where interactive response is more important
> > than not OOMing.
> >
> > With this patch and min_filelist_kbytes set to 50000, I see very little
> > block layer activity during low memory. The system stays responsive under
> > low memory and browser tab switching is fast. Eventually, a process a gets
> > killed by OOM. Without this patch, the system gets wedged for minutes
> > before it eventually OOMs. Below is the vmstat output from my test runs.
>
> I've heared similar requirement sometimes from embedded people. then also
> don't use swap. then, I don't think this is hopeless idea. but I hope to
> clarify some thing at first.
>

swap would be intersting if we could somehow control swap thrashing. Maybe
we could add min_anonlist_kbytes. Just kidding:)

> Yes, a system often have should-not-be-evicted-file-caches. Typically, they
> are libc, libX11 and some GUI libraries. Traditionally, we was making tiny
> application which linked above important lib and call mlockall() at startup.
> such technique prevent reclaim. So, Q1: Why do you think above traditional way
> is insufficient?
>

mlock is too coarse grain. It requires locking the whole file in memory.
The chrome and X binaries are quite large so locking them would waste a lot
of memory. We could lock just the pages that are part of the working set but
that is difficult to do in practice. Its unmaintainable if you do it
statically. If you do it at runtime by mlocking the working set, you're
sort of giving up on mm's active list.

Like akpm, I'm sad that we need this patch. I'd rather the kernel did a better
job of identifying the working set. We did look at ways to do a better
job of keeping the working set in the active list but these were tricker
patches and never quite worked out. This patch is simple and works great.

Under memory pressure, I see the active list get smaller and smaller. Its
getting smaller because we're scanning it faster and faster, causing more
and more page faults which slows forward progress resulting in the active
list getting smaller still. One way to approach this might to make the
scan rate constant and configurable. It doesn't seem right that we scan
memory faster and faster under low memory. For us, we'd rather OOM than
evict pages that are likely to be accessed again so we'd prefer to make
a conservative estimate as to what belongs in the working set. Other
folks (long computations) might want to reclaim more aggressively.

> Q2: In the above you used min_filelist_kbytes=50000. How do you decide
> such value? Do other users can calculate proper value?
>

50M was small enough that we were comfortable with keeping 50M of file pages
in memory and large enough that it is bigger than the working set. I tested
by loading up a bunch of popular web sites in chrome and then observing what
happend when I ran out of memory. With 50M, I saw almost no thrashing and
the system stayed responsive even under low memory. but I wanted to be
conservative since I'm really just guessing.

Other users could calculate their value by doing something similar. Load
up the system (exhaust free memory) with a typical load and then observe
file io via vmstat. They can then set min_filelist_kbytes to the value
where they see a tolerable amounting of thrashing (page faults, block io).

> In addition, I have two request. R1: I think chromium specific feature is
> harder acceptable because it's harder maintable. but we have good chance to
> solve embedded generic issue. Please discuss Minchan and/or another embedded

I think this feature should be useful to a lot of embedded applications where
OOM is OK, especially web browsing applications where the user is OK with
losing 1 of many tabs they have open. However, I consider this patch a
stop-gap. I think the real solution is to do a better job of protecting
the active list.

> developers. R2: If you want to deal OOM combination, please consider to
> combination of memcg OOM notifier too. It is most flexible and powerful OOM
> mechanism. Probably desktop and server people never use bare OOM killer intentionally.
>

Yes, will definitely look at OOM notifier. Currently trying to see if we can
get by with oomadj. With OOM notifier you'd have to respond earlier so you
might OOM more. However, with a notifier you might be able to take action that
might prevent OOM altogether.

I see memcg more as an isolation mechanism but I guess you could use it to
isolate the working set from anon browser tab data as Kamezawa suggests.

Regards,
Mandeep

> Thanks.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/