Re: Performance regression in scsi sequential throughput (iozone)due to "e084b - page-allocator: preserve PFN ordering when__GFP_COLD is set"

From: Mel Gorman
Date: Mon Feb 08 2010 - 11:56:17 EST


> <SNIP>
> The prototype patch for avoiding congestion_wait is below. I'll start
> work on a fallback-to-other-percpu-lists patch.
>

And here is the prototype of the fallback-to-other-percpu-lists patch.
I'm afraid I've only managed to test it on qemu. My three test machines are
still occupied :(

==== CUT HERE ====
page allocator: Fallback to other per-cpu lists when the target list is empty and memory is low

When a per-cpu list of pages for a given migratetype is empty, the page
allocator is called to refill the PCP list. It's possible when memory
is low that this results in the process entering direct reclaim even
if it wasn't strictly necessary because there were pages free for other
migratetypes. Unconditionally falling back to other PCP lists hurts the
fragmentation-avoidance strategy which is also undesirable.

When the desired PCP list is empty, this patch checks how many free pages
there are on the PCP lists and if refilling the list could result in direct
reclaim. If direct reclaim is unlikely, the PCP list is refilled to maintain
fragmentation-avoidance. Otherwise, a page from an alternative PCP list is
chosen to maintain performance and avoid direct reclaim.

Signed-off-by: Mel Gorman <mel@xxxxxxxxx>
---
mm/page_alloc.c | 37 ++++++++++++++++++++++++++++++++++---
1 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8deb9d0..009d683 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,39 @@ void split_page(struct page *page, unsigned int order)
set_page_refcounted(page + i);
}

+/* Decide whether to find an alternative PCP list or refill */
+static struct list_head *pcp_fallback(struct zone *zone,
+ struct per_cpu_pages *pcp,
+ int start_migratetype, int cold)
+{
+ int i;
+ int migratetype;
+ struct list_head *list;
+ long free_pages = zone_page_state(zone, NR_FREE_PAGES) - pcp->batch;
+
+ /*
+ * Find a PCPU list with free pages in the same order as
+ * fragmentation-avoidance fallback in the event that refilling
+ * the PCP list may result in direct reclaim
+ */
+ if (pcp->count && free_pages <= low_wmark_pages(zone)) {
+ for (i = 0; i < MIGRATE_PCPTYPES - 1; i++) {
+ migratetype = fallbacks[start_migratetype][i];
+ list = &pcp->lists[migratetype];
+
+ if (!list_empty(list))
+ return list;
+ }
+ }
+
+ /* Alternatively, we need to allocate more memory to the PCP lists */
+ list = &pcp->lists[start_migratetype];
+ pcp->count += rmqueue_bulk(zone, 0, pcp->batch, list,
+ migratetype, cold);
+
+ return list;
+}
+
/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch
@@ -1193,9 +1226,7 @@ again:
list = &pcp->lists[migratetype];
local_irq_save(flags);
if (list_empty(list)) {
- pcp->count += rmqueue_bulk(zone, 0,
- pcp->batch, list,
- migratetype, cold);
+ list = pcp_fallback(zone, pcp, migratetype, cold);
if (unlikely(list_empty(list)))
goto failed;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/