Re: Fix for SLUB? (was: Fwd: [PATCH v3] mm: make expand_downwardssymmetrical to expand_upwards)

From: Geert Uytterhoeven
Date: Sat Apr 23 2011 - 09:08:30 EST


[Added some CCs]

On Sat, Apr 23, 2011 at 05:47, Michael Schmitz
<schmitzmic@xxxxxxxxxxxxxx> wrote:
> Hi,
>
> node_present_pages(node) returns false:
>
> m68k_setup_node: node 0 addr 0 size 14680064
> m68k_setup_node: node 0 not present!
> m68k_setup_node: node 1 addr 16777216 size 268435456
> m68k_setup_node: node 1 not present!
>
> Changing the patch to
>
> diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
> --- a/arch/m68k/mm/init_mm.c
> +++ b/arch/m68k/mm/init_mm.c
> @@ -59,6 +59,7 @@ void __init m68k_setup_node(int node)
> Â Â Â }
> Â#endif
> Â Â Â pg_data_map[node].bdata = bootmem_node_data + node;
> + Â Â Â node_set_state(node, N_NORMAL_MEMORY);
> Â Â Â node_set_online(node);
> Â}
>
> i.e. ignoring the node_present_pages return value does result in a
> booting kernel even with the problematic commit included.
>
> I'll leave it to the mm experts to explain why node_present_pages
> returns zero here.
>
> Cheers,
>
> ÂMichael
>
>
>
> On Sat, Apr 23, 2011 at 2:14 PM, Michael Schmitz
> <schmitzmic@xxxxxxxxxxxxxx> wrote:
>> Looks like that wasn't helping after all. I still need to revert said
>> commit. Guess I'll have to check what node_present_pages(node) returns
>> in each case ...
>>
>> Cheers,
>>
>> ÂMIchael
>>
>>
>> On Sat, Apr 23, 2011 at 1:31 PM, Michael Schmitz
>> <schmitzmic@xxxxxxxxxxxxxx> wrote:
>>> I'll check this out - might well be the correct fix for our problems.
>>>
>>> Cheers,
>>>
>>> ÂMichael
>>>
>>>
>>> On Thu, Apr 21, 2011 at 8:19 PM, Geert Uytterhoeven
>>> <geert@xxxxxxxxxxxxxx> wrote:
>>>> ---------- Forwarded message ----------
>>>> From: David Rientjes <rientjes@xxxxxxxxxx>
>>>> Date: Thu, Apr 21, 2011 at 01:12
>>>> Subject: Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
>>>> To: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
>>>> Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>, Pekka Enberg
>>>> <penberg@xxxxxxxxxx>, Christoph Lameter <cl@xxxxxxxxx>, Michal Hocko
>>>> <mhocko@xxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Hugh
>>>> Dickins <hughd@xxxxxxxxxx>, linux-mm@xxxxxxxxx, LKML
>>>> <linux-kernel@xxxxxxxxxxxxxxx>, linux-parisc@xxxxxxxxxxxxxxx, Ingo
>>>> Molnar <mingo@xxxxxxx>, x86 maintainers <x86@xxxxxxxxxx>
>>>>
>>>>
>>>> On Wed, 20 Apr 2011, James Bottomley wrote:
>>>>
>>>>> > This is probably because the parisc's DISCONTIGMEM memory ranges don't
>>>>> > have bits set in N_NORMAL_MEMORY.
>>>>> >
>>>>> > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
>>>>> > --- a/arch/parisc/mm/init.c
>>>>> > +++ b/arch/parisc/mm/init.c
>>>>> > @@ -266,8 +266,10 @@ static void __init setup_bootmem(void)
>>>>> > Â Â }
>>>>> > Â Â memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
>>>>> >
>>>>> > - Â for (i = 0; i < npmem_ranges; i++)
>>>>> > + Â for (i = 0; i < npmem_ranges; i++) {
>>>>> > + Â Â Â Â Â node_set_state(i, N_NORMAL_MEMORY);
>>>>> > Â Â Â Â Â Â node_set_online(i);
>>>>> > + Â }
>>>>> > Â#endif
>>>>>
>>>>> Yes, this seems to be the missing piece that gets it to boot. ÂWe really
>>>>> need this in generic code, unless someone wants to run through all the
>>>>> other arch's doing it ...
>>>>>
>>>>
>>>> Looking at all other architectures that allow ARCH_DISCONTIGMEM_ENABLE, we
>>>> already know x86 is fine, avr32 disables ARCH_DISCONTIGMEM_ENABLE entirely
>>>> because its code only brings online node 0, and tile already sets the bit
>>>> in N_NORMAL_MEMORY correctly when bringing a node online, probably because
>>>> it was introduced after the various node state masks were added in
>>>> 7ea1530ab3fd back in October 2007.
>>>>
>>>> So we're really only talking about alpha, ia64, m32r, m68k, and mips and
>>>> it only seems to matter when using CONFIG_SLUB, which isn't surprising
>>>> when greping for it:
>>>>
>>>> Â Â Â Â$ grep -r N_NORMAL_MEMORY mm/*
>>>> Â Â Â Âmm/memcontrol.c: Â Â Â Âif (!node_state(node, N_NORMAL_MEMORY))
>>>> Â Â Â Âmm/memcontrol.c: Â Â Â Â Â Â Â Âif (!node_state(node, N_NORMAL_MEMORY))
>>>> Â Â Â Âmm/page_alloc.c: Â Â Â Â[N_NORMAL_MEMORY] = { { [0] = 1UL } },
>>>> Â Â Â Âmm/page_alloc.c:
>>>> node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Â Â Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Â Â Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY) {
>>>> Â Â Â Âmm/slub.c: Â Â Âfor_each_node_state(node, N_NORMAL_MEMORY)
>>>>
>>>> Those memory controller occurrences only result in it passing a node id of
>>>> -1 to kmalloc_node() which means no specific node target, and that's fine
>>>> for DISCONTIGMEM since we don't care about any proximity between memory
>>>> ranges.
>>>>
>>>> This should fix the remaining architectures so they can use CONFIG_SLUB,
>>>> but I hope it can be tested by the individual arch maintainers like you
>>>> did for parisc.
>>>>
>>>> diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
>>>> --- a/arch/alpha/mm/numa.c
>>>> +++ b/arch/alpha/mm/numa.c
>>>> @@ -245,6 +245,7 @@ setup_memory_node(int nid, void *kernel_end)
>>>> Â Â Â Â Â Â Â Â Â Â Â Âbootmap_size, BOOTMEM_DEFAULT);
>>>> Â Â Â Âprintk(" reserving pages %ld:%ld\n", bootmap_start,
>>>> bootmap_start+PFN_UP(bootmap_size));
>>>>
>>>> + Â Â Â node_set_state(nid, N_NORMAL_MEMORY);
>>>> Â Â Â Ânode_set_online(nid);
>>>> Â}
>>>>
>>>> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
>>>> --- a/arch/ia64/mm/discontig.c
>>>> +++ b/arch/ia64/mm/discontig.c
>>>> @@ -573,6 +573,8 @@ void __init find_memory(void)
>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âmap>>PAGE_SHIFT,
>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âbdp->node_min_pfn,
>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âbdp->node_low_pfn);
>>>> + Â Â Â Â Â Â Â if (node_present_pages(node))
>>>> + Â Â Â Â Â Â Â Â Â Â Â node_set_state(node, N_NORMAL_MEMORY);
>>>> Â Â Â Â}
>>>>
>>>> Â Â Â Âefi_memmap_walk(filter_rsvd_memory, free_node_bootmem);
>>>> diff --git a/arch/m32r/kernel/setup.c b/arch/m32r/kernel/setup.c
>>>> --- a/arch/m32r/kernel/setup.c
>>>> +++ b/arch/m32r/kernel/setup.c
>>>> @@ -247,7 +247,9 @@ void __init setup_arch(char **cmdline_p)
>>>>
>>>> Â#ifdef CONFIG_DISCONTIGMEM
>>>> Â Â Â Ânodes_clear(node_online_map);
>>>> + Â Â Â node_set_state(0, N_NORMAL_MEMORY); Â Â /* always has memory */
>>>> Â Â Â Ânode_set_online(0);
>>>> + Â Â Â node_set_state(1, N_NORMAL_MEMORY); Â Â /* always has memory */
>>>> Â Â Â Ânode_set_online(1);
>>>> Â#endif /* CONFIG_DISCONTIGMEM */
>>>>
>>>> diff --git a/arch/m68k/mm/init_mm.c b/arch/m68k/mm/init_mm.c
>>>> --- a/arch/m68k/mm/init_mm.c
>>>> +++ b/arch/m68k/mm/init_mm.c
>>>> @@ -59,6 +59,8 @@ void __init m68k_setup_node(int node)
>>>> Â Â Â Â}
>>>> Â#endif
>>>> Â Â Â Âpg_data_map[node].bdata = bootmem_node_data + node;
>>>> + Â Â Â if (node_present_pages(node))
>>>> + Â Â Â Â Â Â Â node_set_state(node, N_NORMAL_MEMORY);
>>>> Â Â Â Ânode_set_online(node);
>>>> Â}
>>>>
>>>> diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
>>>> --- a/arch/mips/sgi-ip27/ip27-memory.c
>>>> +++ b/arch/mips/sgi-ip27/ip27-memory.c
>>>> @@ -471,6 +471,8 @@ void __init paging_init(void)
>>>>
>>>> Â Â Â Â Â Â Â Âif (end_pfn > max_low_pfn)
>>>> Â Â Â Â Â Â Â Â Â Â Â Âmax_low_pfn = end_pfn;
>>>> + Â Â Â Â Â Â Â if (end_pfn > start_pfn)
>>>> + Â Â Â Â Â Â Â Â Â Â Â node_set_state(node, N_NORMAL_MEMORY);
>>>> Â Â Â Â}
>>>> Â Â Â Âzones_size[ZONE_NORMAL] = max_low_pfn;
>>>> Â Â Â Âfree_area_init_nodes(zones_size);

Gr{oetje,eeting}s,

            Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
             Â Â -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/