Re: [PATCH] mm: Do not stall in synchronous compaction for THPallocations

From: David Rientjes
Date: Tue Nov 15 2011 - 19:07:17 EST


On Tue, 15 Nov 2011, Mel Gorman wrote:

> Adding sync here could obviously be implemented although it may
> require both always-sync and madvise-sync. Alternatively, something
> like an options file could be created to create a bitmap similar to
> what ftrace does. Whatever the mechanism, it exposes the fact that
> "sync compaction" is used. If that turns out to be not enough, then
> you may want to add other steps like aggressively reclaiming memory
> which also potentially may need to be controlled via the sysfs file
> and this is the slippery slope.
>

So what's being proposed here in this patch is the fifth time this line
has been changed and its always been switched between true and !(gfp_mask
& __GFP_NO_KSWAPD). Instead of changing it every few months, I'd suggest
that we tie the semantics of the tunable directly to sync_compaction since
we're primarily targeting thp hugepages with this change anyway for the
"always" case. Comments?

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -116,6 +116,13 @@ echo always >/sys/kernel/mm/transparent_hugepage/defrag
echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo never >/sys/kernel/mm/transparent_hugepage/defrag

+If defrag is set to "always", then all hugepage allocations also attempt
+synchronous memory compaction which makes the allocation as aggressive
+as possible. The overhead of attempting to allocate the hugepage is
+considered acceptable because of the longterm benefits of the hugepage
+itself at runtime. If the VM should fallback to using regular pages
+instead, then you should use "madvise" or "never".
+
khugepaged will be automatically started when
transparent_hugepage/enabled is set to "always" or "madvise, and it'll
be automatically shutdown if it's set to "never".

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2168,7 +2168,17 @@ rebalance:
sync_migration);
if (page)
goto got_pg;
- sync_migration = true;
+
+ /*
+ * Do not use synchronous migration for transparent hugepages unless
+ * defragmentation is always attempted for such allocations since it
+ * can stall in writeback, which is far worse than simply failing to
+ * promote a page. Otherwise, we really do want a hugepage and are as
+ * aggressive as possible to allocate it.
+ */
+ sync_migration = !(gfp_mask & __GFP_NO_KSWAPD) ||
+ (transparent_hugepage_flags &
+ (1 << TRANSPARENT_HUGEPAGE_DEFRAG_FLAG));

/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/