Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure

From: Christian Ehrhardt
Date: Wed Apr 21 2010 - 00:24:12 EST

Next message: Bjorn Helgaas: "Re: PCIe hotplug failure"
Previous message: Ashwin Ganti: "Re: [PATCH 3/3] p9auth: add p9auth driver"
In reply to: Rik van Riel: "Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure"
Next in thread: Christian Ehrhardt: "Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Rik van Riel wrote:

On 04/20/2010 11:32 AM, Johannes Weiner wrote:

The idea is that it pans out on its own. If the workload changes, new
pages get activated and when that set grows too large, we start shrinking
it again.

Of course, right now this unscanned set is way too large and we can end
up wasting up to 50% of usable page cache on false active pages.

Thing is, changing workloads often change back.

Specifically, think of a desktop system that is doing
work for the user during the day and gets backed up
at night.

You do not want the backup to kick the working set
out of memory, because when the user returns in the
morning the desktop should come back quickly after
the screensaver is unlocked.

IMHO it is fine to prevent that nightly backup job from not being finished when the user arrives at morning because we didn't give him some more cache - and e.g. a 30 sec transition from/to both optimized states is fine.
But eventually I guess the point is that both behaviors are reasonable to achieve - depending on the users needs.

What we could do is combine all our thoughts we had so far:
a) Rik could create an experimental patch that excludes the in flight pages
b) Johannes could create one for his suggestion to "always scan active file pages but only deactivate them when the ratio is off and otherwise strip buffers of clean pages"
c) I would extend the patch from Johannes setting the ratio of active/inactive pages to be a userspace tunable

a,b,a+b would then need to be tested if they achieve a better behavior.

c on the other hand would be a fine tunable to let administrators (knowing their workloads) or distributions (e.g. different values for Desktop/Server defaults) adapt their installations.

In theory a,b and c should work fine together in case we need all of them.

The big question is, what workload suffers from
having the inactive list at 50% of the page cache?

So far the only big problem we have seen is on a
very unbalanced virtual machine, with 256MB RAM
and 4 fast disks. The disks simply have more IO
in flight at once than what fits in the inactive
list.

Did I get you right that this means the write case - explaining why it is building up buffers to the 50% max?

Note: It even uses up to 64 disks, with 1 disk per thread so e.g. 16 threads => 16 disks.

For being "unbalanced" I'd like to mention that over the years I learned that sometimes, after a while, virtualized systems look that way without being intended - this happens by adding more and more guests and let guest memory balooning take care of it.

This is a very untypical situation, and we can
probably solve it by excluding the in-flight pages
from the active/inactive file calculation.

--

GrÃsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Bjorn Helgaas: "Re: PCIe hotplug failure"
Previous message: Ashwin Ganti: "Re: [PATCH 3/3] p9auth: add p9auth driver"
In reply to: Rik van Riel: "Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure"
Next in thread: Christian Ehrhardt: "Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]