Re: [PATCH] mm: get rid of unnecessary pageblock scanning insetup_zone_migrate_reserve

From: Mel Gorman
Date: Wed Oct 30 2013 - 11:19:15 EST


On Wed, Oct 23, 2013 at 05:01:32PM -0400, kosaki.motohiro@xxxxxxxxx wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
>
> Yasuaki Ithimatsu reported memory hot-add spent more than 5 _hours_
> on 9TB memory machine and we found out setup_zone_migrate_reserve
> spnet >90% time.
>
> The problem is, setup_zone_migrate_reserve scan all pageblock
> unconditionally, but it is only necessary number of reserved block
> was reduced (i.e. memory hot remove).
> Moreover, maximum MIGRATE_RESERVE per zone are currently 2. It mean,
> number of reserved pageblock are almost always unchanged.
>
> This patch adds zone->nr_migrate_reserve_block to maintain number
> of MIGRATE_RESERVE pageblock and it reduce an overhead of
> setup_zone_migrate_reserve dramatically.
>

It seems regrettable to expand the size of struct zone just for this.
You are right that the number of blocks does not exceed 2 because of a
check made in setup_zone_migrate_reserve so it should be possible to
special case this. I didn't test this or think about it particularly
carefully and no doubt there is a nicer way but for illustration
purposes see the patch below.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dd886fa..1aedddd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3897,6 +3897,8 @@ static int pageblock_is_reserved(unsigned long start_pfn, unsigned long end_pfn)
return 0;
}

+#define MAX_MIGRATE_RESERVE_BLOCKS 2
+
/*
* Mark a number of pageblocks as MIGRATE_RESERVE. The number
* of blocks reserved is based on min_wmark_pages(zone). The memory within
@@ -3910,6 +3912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
struct page *page;
unsigned long block_migratetype;
int reserve;
+ int found = 0;

/*
* Get the start pfn, end pfn and the number of blocks to reserve
@@ -3926,11 +3929,11 @@ static void setup_zone_migrate_reserve(struct zone *zone)
/*
* Reserve blocks are generally in place to help high-order atomic
* allocations that are short-lived. A min_free_kbytes value that
- * would result in more than 2 reserve blocks for atomic allocations
- * is assumed to be in place to help anti-fragmentation for the
- * future allocation of hugepages at runtime.
+ * would result in more than MAX_MIGRATE_RESERVE_BLOCKS reserve blocks
+ * for atomic allocations is assumed to be in place to help
+ * anti-fragmentation for the future allocation of hugepages at runtime.
*/
- reserve = min(2, reserve);
+ reserve = min(MAX_MIGRATE_RESERVE_BLOCKS, reserve);

for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
if (!pfn_valid(pfn))
@@ -3956,6 +3959,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
/* If this block is reserved, account for it */
if (block_migratetype == MIGRATE_RESERVE) {
reserve--;
+ found++;
continue;
}

@@ -3970,6 +3974,10 @@ static void setup_zone_migrate_reserve(struct zone *zone)
}
}

+ /* If all possible reserve blocks have been found, we're done */
+ if (found >= MAX_MIGRATE_RESERVE_BLOCKS)
+ break;
+
/*
* If the reserve is met and this is a previous reserved block,
* take it back

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/