[PATCH -v3] nobootmem/bootmem, x86: Fix 32bit numa system withoutRAM on Node0

From: Yinghai Lu
Date: Wed Mar 31 2010 - 22:03:57 EST



on one system without RAM on nod0, got following dump with 32bit numa kernel

early_node_map[4] active PFN ranges
1: 0x00000010 -> 0x00000099
1: 0x00000100 -> 0x0007da00
1: 0x0007e800 -> 0x0007ffa0
1: 0x0007ffae -> 0x0007ffb0

Subtract (29 early reservations)
#000 [0000001000 - 0000002000]
#001 [0000089000 - 000008f000]
#002 [0000091000 - 0000093500]
#003 [0000094000 - 0000099000]
#004 [0000099400 - 0000100000]
#005 [0000200000 - 0000eb7644]
#006 [0000eb8000 - 0000ec327c]
#007 [007c400000 - 007c40e000]
#008 [007c440000 - 007c44e000]
#009 [007c480000 - 007c48e000]
#010 [007c4c0000 - 007c4ce000]
#011 [007c500000 - 007c50e000]
#012 [007c540000 - 007c54e000]
#013 [007c580000 - 007c58e000]
#014 [007c5c0000 - 007c5ce000]
#015 [007c674000 - 007cbfe000]
#016 [007cbfe500 - 007cbfe530]
#017 [007cbfe540 - 007cbfe5d0]
#018 [007cbfe600 - 007cbfe620]
#019 [007cbfe640 - 007cbfe660]
#020 [007cbfe680 - 007cbfe684]
#021 [007cbfe6c0 - 007cbfe6c4]
#022 [007cbfe700 - 007cbfe77e]
#023 [007cbfe780 - 007cbfe7fe]
#024 [007cbfe800 - 007cbfec54]
#025 [007cbfec80 - 007cbfeede]
#026 [007cbfef00 - 007cbfef2d]
#027 [007cbfef40 - 007e800000]
#028 [007e9ca000 - 007ff95000]
(0 free memory ranges)
Initializing HighMem for node 0 (00000000:00000000)
Initializing HighMem for node 1 (00000000:00000000)
Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
virtual kernel memory layout:
fixmap : 0xff637000 - 0xfffff000 (10016 kB)
pkmap : 0xff200000 - 0xff400000 (2048 kB)
vmalloc : 0xc07b0000 - 0xff1fe000 (1002 MB)
lowmem : 0x40000000 - 0xbffb0000 (2047 MB)
.init : 0x40d39000 - 0x40db2000 ( 484 kB)
.data : 0x40881924 - 0x40d38e1c (4829 kB)
.text : 0x40200000 - 0x40881924 (6662 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
swapper: page allocation failure. order:0, mode:0x0
Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
Call Trace:
[<4087a5dc>] ? printk+0xf/0x11
[<40286728>] __alloc_pages_nodemask+0x417/0x487
[<402a9ce1>] new_slab+0xe2/0x1fe
[<402aa5b2>] kmem_cache_open+0x185/0x358
[<402abbc0>] T.954+0x1c/0x60
[<40d52a29>] kmem_cache_init+0x24/0x113
[<40d39738>] start_kernel+0x166/0x2e4
[<40d3940e>] ? unknown_bootoption+0x0/0x18e
[<40d390ce>] i386_start_kernel+0xce/0xd5
Mem-Info:
Node 1 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Node 1 Normal per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:0 slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0 bounce:0

When 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, We need to go with node1
because early_node_map still use 1 for all ranges, and ram from node1
become low ram.

Try to use MAX_NUMNODES like 64 numa does.

Also fixes BOOTMEM path by loop bdata_list.
Note: this bug exist before We have NO_BOOTMEM support.

-v3: add more comments, and fix bootmem path too.

Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>

---
mm/bootmem.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -303,9 +303,22 @@ unsigned long __init free_all_bootmem_no
unsigned long __init free_all_bootmem(void)
{
#ifdef CONFIG_NO_BOOTMEM
- return free_all_memory_core_early(NODE_DATA(0)->node_id);
+ /*
+ * We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
+ * because in some case like Node0 doesnt have RAM installed
+ * low ram will be on Node1
+ * Use MAX_NUMNODES will make sure all ranges in early_node_map[]
+ * will be used instead of only Node0 related
+ */
+ return free_all_memory_core_early(MAX_NUMNODES);
#else
- return free_all_bootmem_core(NODE_DATA(0)->bdata);
+ unsigned long total_pages = 0;
+ bootmem_data_t *bdata;
+
+ list_for_each_entry(bdata, &bdata_list, list)
+ total_pages = free_all_bootmem_core(bdata);
+
+ return total_pages;
#endif
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/