Re: [tip:x86/urgent] x86, numa: For each node, register the memoryblocks actually used

From: H. Peter Anvin
Date: Mon Oct 11 2010 - 18:21:27 EST


On 10/11/2010 03:05 PM, David Rientjes wrote:
>>
>> Use nodememblk_range[] instead of nodes[] in order to make sure we
>> capture the actual memory blocks registered with each node. nodes[]
>> contains an extended range which spans all memory regions associated
>> with a node, but that does not mean that all the memory in between are
>> included.
>>
>> Reported-by: Russ Anderson <rja@xxxxxxx>
>> Tested-by: Russ Anderson <rja@xxxxxxx>
>> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>> LKML-Reference: <4CB27BDF.5000800@xxxxxxxxxx>
>> Cc: David Rientjes <rientjes@xxxxxxxxxx>
>> Cc: <stable@xxxxxxxxxx> 2.6.33 .34 .35 .36
>> Signed-off-by: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx>
>
> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
>
> Sorry I hadn't seen this thread earlier, I wasn't cc'd on it.

Thanks for confirming. I don't have access to any systems on which I
can verify this condition myself, but I spent some fairly serious time
time morning on code inspection, and I'm pretty sure I grok what this
patch does and that it is the right thing.

This is not just an SGI UV problem but will in fact bite any system
which has nodes with interlaced memory blocks (for example block 0
belongs to node 0, block 1 belongs to node 1, and then block 2 belongs
to node 0 again.)

There are multiple loops after these which rely on the nodes[] range,
but in fact they rely on exactly this loop to have registered the
relevant memory ranges for the node, so fixing this loop fixes the
subsequent ones. Of course, it *seriously* begs the question why
nodes[] carry a range at all (well, other than to support bootmem, which
seems like yet another good reason to finish off bootmem.)

Any help in testing would be highly appreciated. Please feel free to
involve anyone else who would likely have access to the kind of large
NUMA x86 systems which are likely to be affected.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/