[PATCH] page_alloc: fix the incorrect adjustment to zone->present_pages

From: Lai Jiangshan
Date: Fri Oct 26 2012 - 05:57:50 EST


Current free_area_init_core() has incorrect adjustment code to adjust
->present_pages. It will cause ->present_pages overflow, make the
system unusable(can't create any process/thread in our test) and cause further problem.

Details:
1) Some/many ZONEs don't have memory which is used by memmap.
{ Or all the actual memory used for memmap is much less than the "memmap_pages"
(memmap_pages = PAGE_ALIGN(span_size * sizeof(struct page)) >> PAGE_SHIFT)
CONFIG_SPARSEMEM is an example. }

2) incorrect adjustment in free_area_init_core(): zone->present_pages -= memmap_pages
3) but the zone has big hole, it causes the result of zone->present_pages become much smaller
4) when we offline a/several memory section of the zone: zone->present_pages -= offline_size
5) Now, zone->present_pages will/may be *OVERFLOW*.

So the adjustment is dangerous and incorrect.

Addition 1:
And in current kernel, the memmaps have nothing related/bound to any ZONE:
FLATMEM: global memmap
CONFIG_DISCONTIGMEM: node-specific memmap
CONFIG_SPARSEMEM: memorysection-specific memmap
None of them is ZONE-specific memmap, and the memory used for memmap is not bound to any ZONE.
So the adjustment "zone->present_pages -= memmap_pages" subtracts unrelated value
and makes no sense.

Addition 2:
We introduced this adjustment and tried to make page-reclaim/watermark happier,
but the adjustment is wrong in current kernel, and even makes page-reclaim/watermark
worse. It is against its original purpose/reason.

This adjustment is incorrect/buggy, subtracts unrelated value and violates its original
purpose, so we simply remove the adjustment.

CC: Mel Gorman <mgorman@xxxxxxx>
Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
---
mm/page_alloc.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..6bf72e3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4455,30 +4455,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,

for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize;

size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);

- /*
- * Adjust realsize so that it accounts for how much memory
- * is used by this zone for memmap. This affects the watermark
- * and per-cpu initialisations
- */
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
- if (memmap_pages)
- printk(KERN_DEBUG
- " %s zone: %lu pages used for memmap\n",
- zone_names[j], memmap_pages);
- } else
- printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
-
/* Account for reserved pages */
if (j == 0 && realsize > dma_reserve) {
realsize -= dma_reserve;
--
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/