Re: [PATCH 2.6.16.29 1/1] MM: enhance Linux swap subsystem

From: yunfeng zhang
Date: Thu Jan 11 2007 - 03:11:33 EST


2007/1/11, Rik van Riel <riel@xxxxxxxxxx>:

Have you actually measured this?

If your measurements saw any performance gains, with what kind
of workload did they happen, how big were they and how do you
explain those performance gains?

How do you balance scanning the private memory with taking
pages off the per-zone page lists?

How do you deal with systems where some zones are really
large and other zones are really small, eg. a 32 bit system
with one 880MB zone and one 15.1GB zone?


You didn't mention another problem, a VMA maybe residents on multiple zones. To
multiple address space, multiple memory inode architecture, we can introduce
a new core object -- section which has several features
1) Section is used as the atomic unit to contain the pages of a VMA residing in
the memory inode of the section.
2) When page migration occurs among different memory inodes, new secion should
be set up to trace the pages.
3) Section can be scanned by the SwapDaemon of its memory inode directely.
4) All sections of a VMA are excluded with each other not overlayed.
5) VMA is made up of sections totally, but its section objects scatter on memory
inodes.
So to the architecture, we can deploy swap subsystem on an
architecture-independent layer by section and scan pages batchly.

If the benefits come mostly from better IO clustering, would
it not be safer/less invasive to add swapout clustering of the
same style that the BSD kernels have?


You mean add a new swap file/partition? I scan every UserSpaces by
init_mm::mmlist which has already been used by swap subsystem, and I've patched
in mm/swapfile.c.

For your reference, the BSD kernels do swapout clustering like this:
1) select a page off the end of the pageout list
2) then scan the page table the page is in, to find
nearby pages that are also eligable for pageout
3) page them all out with one disk I/O operation

The same could also be done for files.

With peterz's dirty tracking (and possible dirty limiting)
code in the kernel, this can be done without the kind of
deadlocks that would have plagued earlier kernels, when
trying to do IO trickery from the pageout path...


I've noticed FreeBSD maybe has a similar design as I mentioned, I'm reading
their code to whether they have another two features -- UnmappedPTE and dftlb in
my Documentation/vm_pps.txt.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/