Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl for protectingthe working set

From: Rik van Riel
Date: Mon Nov 01 2010 - 23:12:04 EST

Next message: Microsoft Award Promotion Team: "MICROSOFT WINNER APPROVE!"
Previous message: David Rientjes: "Re: [PATCH]oom-kill: direct hardware access processes should getbonus"
In reply to: Mandeep Singh Baines: "Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl forprotecting the working set"
Next in thread: Minchan Kim: "Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl forprotecting the working set"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/01/2010 03:43 PM, Mandeep Singh Baines wrote:

Yes, this prevents you from reclaiming the active list all at once. But if the
memory pressure doesn't go away, you'll start to reclaim the active list
little by little. First you'll empty the inactive list, and then
you'll start scanning
the active list and pulling pages from inactive to active. The problem is that
there is no minimum time limit to how long a page will sit in the inactive list
before it is reclaimed. Just depends on scan rate which does not depend
on time.

In my experiments, I saw the active list get smaller and smaller
over time until eventually it was only a few MB at which point the system came
grinding to a halt due to thrashing.

I believe that changing the active/inactive ratio has other
potential thrashing issues. Specifically, when the inactive
list is too small, pages may not stick around long enough to
be accessed multiple times and get promoted to the active
list, even when they are in active use.

I prefer a more flexible solution, that automatically does
the right thing.

The problem you see is that the file list gets reclaimed
very quickly, even when it is already very small.

I wonder if a possible solution would be to limit how fast
file pages get reclaimed, when the page cache is very small.
Say, inactive_file * active_file < 2 * zone->pages_high ?

At that point, maybe we could slow down the reclaiming of
page cache pages to be significantly slower than they can
be refilled by the disk. Maybe 100 pages a second - that
can be refilled even by an actual spinning metal disk
without even the use of readahead.

That can be rounded up to one batch of SWAP_CLUSTER_MAX
file pages every 1/4 second, when the number of page cache
pages is very low.

This way HPC and virtual machine hosting nodes can still
get rid of totally unused page cache, but on any system
that actually uses page cache, some minimal amount of
cache will be protected under heavy memory pressure.

Does this sound like a reasonable approach?

I realize the threshold may have to be tweaked...

The big question is, how do we integrate this with the
OOM killer? Do we pretend we are out of memory when
we've hit our file cache eviction quota and kill something?

Would there be any downsides to this approach?

Are there any volunteers for implementing this idea?
(Maybe someone who needs the feature?)

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Microsoft Award Promotion Team: "MICROSOFT WINNER APPROVE!"
Previous message: David Rientjes: "Re: [PATCH]oom-kill: direct hardware access processes should getbonus"
In reply to: Mandeep Singh Baines: "Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl forprotecting the working set"
Next in thread: Minchan Kim: "Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl forprotecting the working set"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]