Re: [PATCH -v3] nobootmem/bootmem, x86: Fix 32bit numa system without RAM on Node0

From: H. Peter Anvin
Date: Wed Mar 31 2010 - 23:22:11 EST


Please address the separate bug fix in a separate patch.

"Yinghai Lu" <yinghai@xxxxxxxxxx> wrote:

>
>on one system without RAM on nod0, got following dump with 32bit numa kernel
>
>early_node_map[4] active PFN ranges
> 1: 0x00000010 -> 0x00000099
> 1: 0x00000100 -> 0x0007da00
> 1: 0x0007e800 -> 0x0007ffa0
> 1: 0x0007ffae -> 0x0007ffb0
>
>Subtract (29 early reservations)
> #000 [0000001000 - 0000002000]
> #001 [0000089000 - 000008f000]
> #002 [0000091000 - 0000093500]
> #003 [0000094000 - 0000099000]
> #004 [0000099400 - 0000100000]
> #005 [0000200000 - 0000eb7644]
> #006 [0000eb8000 - 0000ec327c]
> #007 [007c400000 - 007c40e000]
> #008 [007c440000 - 007c44e000]
> #009 [007c480000 - 007c48e000]
> #010 [007c4c0000 - 007c4ce000]
> #011 [007c500000 - 007c50e000]
> #012 [007c540000 - 007c54e000]
> #013 [007c580000 - 007c58e000]
> #014 [007c5c0000 - 007c5ce000]
> #015 [007c674000 - 007cbfe000]
> #016 [007cbfe500 - 007cbfe530]
> #017 [007cbfe540 - 007cbfe5d0]
> #018 [007cbfe600 - 007cbfe620]
> #019 [007cbfe640 - 007cbfe660]
> #020 [007cbfe680 - 007cbfe684]
> #021 [007cbfe6c0 - 007cbfe6c4]
> #022 [007cbfe700 - 007cbfe77e]
> #023 [007cbfe780 - 007cbfe7fe]
> #024 [007cbfe800 - 007cbfec54]
> #025 [007cbfec80 - 007cbfeede]
> #026 [007cbfef00 - 007cbfef2d]
> #027 [007cbfef40 - 007e800000]
> #028 [007e9ca000 - 007ff95000]
>(0 free memory ranges)
>Initializing HighMem for node 0 (00000000:00000000)
>Initializing HighMem for node 1 (00000000:00000000)
>Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
>virtual kernel memory layout:
> fixmap : 0xff637000 - 0xfffff000 (10016 kB)
> pkmap : 0xff200000 - 0xff400000 (2048 kB)
> vmalloc : 0xc07b0000 - 0xff1fe000 (1002 MB)
> lowmem : 0x40000000 - 0xbffb0000 (2047 MB)
> .init : 0x40d39000 - 0x40db2000 ( 484 kB)
> .data : 0x40881924 - 0x40d38e1c (4829 kB)
> .text : 0x40200000 - 0x40881924 (6662 kB)
>Checking if this processor honours the WP bit even in supervisor mode...Ok.
>swapper: page allocation failure. order:0, mode:0x0
>Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
>Call Trace:
> [<4087a5dc>] ? printk+0xf/0x11
> [<40286728>] __alloc_pages_nodemask+0x417/0x487
> [<402a9ce1>] new_slab+0xe2/0x1fe
> [<402aa5b2>] kmem_cache_open+0x185/0x358
> [<402abbc0>] T.954+0x1c/0x60
> [<40d52a29>] kmem_cache_init+0x24/0x113
> [<40d39738>] start_kernel+0x166/0x2e4
> [<40d3940e>] ? unknown_bootoption+0x0/0x18e
> [<40d390ce>] i386_start_kernel+0xce/0xd5
>Mem-Info:
>Node 1 DMA per-cpu:
>CPU 0: hi: 0, btch: 1 usd: 0
>Node 1 Normal per-cpu:
>CPU 0: hi: 0, btch: 1 usd: 0
>active_anon:0 inactive_anon:0 isolated_anon:0
> active_file:0 inactive_file:0 isolated_file:0
> unevictable:0 dirty:0 writeback:0 unstable:0
> free:0 slab_reclaimable:0 slab_unreclaimable:0
> mapped:0 shmem:0 pagetables:0 bounce:0
>
>When 32bit numa is used, free_all_bootmem() will still only go over with
>node id 0.
>
>If node 0 doesn't have RAM installed, We need to go with node1
>because early_node_map still use 1 for all ranges, and ram from node1
>become low ram.
>
>Try to use MAX_NUMNODES like 64 numa does.
>
>Also fixes BOOTMEM path by loop bdata_list.
>Note: this bug exist before We have NO_BOOTMEM support.
>
>-v3: add more comments, and fix bootmem path too.
>
>Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>
>---
> mm/bootmem.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
>Index: linux-2.6/mm/bootmem.c
>===================================================================
>--- linux-2.6.orig/mm/bootmem.c
>+++ linux-2.6/mm/bootmem.c
>@@ -303,9 +303,22 @@ unsigned long __init free_all_bootmem_no
> unsigned long __init free_all_bootmem(void)
> {
> #ifdef CONFIG_NO_BOOTMEM
>- return free_all_memory_core_early(NODE_DATA(0)->node_id);
>+ /*
>+ * We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
>+ * because in some case like Node0 doesnt have RAM installed
>+ * low ram will be on Node1
>+ * Use MAX_NUMNODES will make sure all ranges in early_node_map[]
>+ * will be used instead of only Node0 related
>+ */
>+ return free_all_memory_core_early(MAX_NUMNODES);
> #else
>- return free_all_bootmem_core(NODE_DATA(0)->bdata);
>+ unsigned long total_pages = 0;
>+ bootmem_data_t *bdata;
>+
>+ list_for_each_entry(bdata, &bdata_list, list)
>+ total_pages = free_all_bootmem_core(bdata);
>+
>+ return total_pages;
> #endif
> }
>

--
Sent from my mobile phone, pardon any lack of formatting.