[PATCH] fix page_alloc for larger I/O segments

From: Mark Lord
Date: Thu Dec 13 2007 - 19:40:38 EST


Mark Lord wrote:
Mark Lord wrote:
Mark Lord wrote:
Mark Lord wrote:
Andrew Morton wrote:
On Thu, 13 Dec 2007 17:15:06 -0500
James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:

On Thu, 2007-12-13 at 14:02 -0800, Andrew Morton wrote:
On Thu, 13 Dec 2007 21:09:59 +0100
Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:

OK, it's a vm issue,
cc linux-mm and probable culprit.

I have tens of thousand "backward" pages after a
boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
reverse. So it looks like that bug got reintroduced.
Bill Irwin fixed this a couple of years back: changed the page allocator so
that it mostly hands out pages in ascending physical-address order.

I guess we broke that, quite possibly in Mel's page allocator rework.

It would help if you could provide us with a simple recipe for
demonstrating this problem, please.
The simple way seems to be to malloc a large area, touch every page and
then look at the physical pages assigned ... they now mostly seem to be
descending in physical address.


OIC. -mm's /proc/pid/pagemap can be used to get the pfn's...
..

I'm actually running the treadmill right now (have been for many hours, actually,
to bisect it to a specific commit.

Thought I was almost done, and then noticed that git-bisect doesn't keep
the Makefile VERSION lines the same, so I was actually running the wrong
kernel after the first few times.. duh.

Wrote a script to fix it now.
..

Well, that was a waste of three hours.
..

Ahh.. it seems to be sensitive to one/both of these:

CONFIG_HIGHMEM64G=y with 4GB RAM: not so bad, frequently does 20KB - 48KB segments.
CONFIG_HIGHMEM4G=y with 2GB RAM: very severe, rarely does more than 8KB segments.
CONFIG_HIGHMEM4G=y with 3GB RAM: very severe, rarely does more than 8KB segments.

So if you want to reproduce this on a large memory machine, use "mem=2GB" for starters.
..

Here's the commit that causes the regression:

535131e6925b4a95f321148ad7293f496e0e58d7 Choose pages from the per-cpu list based on migration type


And here is a patch that seems to fix it for me here:

* * * *

Fix page allocator to give better change of larger contiguous segments (again).

Signed-off-by: Mark Lord <mlord@xxxxxxxxx
---


--- old/mm/page_alloc.c.orig 2007-12-13 19:25:15.000000000 -0500
+++ linux-2.6/mm/page_alloc.c 2007-12-13 19:35:50.000000000 -0500
@@ -954,7 +954,7 @@
goto failed;
}
/* Find a page of the appropriate migrate type */
- list_for_each_entry(page, &pcp->list, lru) {
+ list_for_each_entry_reverse(page, &pcp->list, lru) {
if (page_private(page) == migratetype) {
list_del(&page->lru);
pcp->count--;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/