Re: [Patch V3 0/9] Enable memoryless node support for x86

From: Jiang Liu
Date: Wed Aug 19 2015 - 04:09:21 EST


On 2015/8/18 18:02, Tang Chen wrote:
>
> On 08/17/2015 11:18 AM, Jiang Liu wrote:
>> This is the third version to enable memoryless node support on x86
>> platforms. The previous version (https://lkml.org/lkml/2014/7/11/75)
>> blindly replaces numa_node_id()/cpu_to_node() with numa_mem_id()/
>> cpu_to_mem(). That's not the right solution as pointed out by Tejun
>> and Peter due to:
>> 1) We shouldn't shift the burden to normal slab users.
>> 2) Details of memoryless node should be hidden in arch and mm code
>> as much as possible.
>>
>> After digging into more code and documentation, we found the rules to
>> deal with memoryless node should be:
>> 1) Arch code should online corresponding NUMA node before onlining any
>> CPU or memory, otherwise it may cause invalid memory access when
>> accessing NODE_DATA(nid).
>> 2) For normal memory allocations without __GFP_THISNODE setting in the
>> gfp_flags, we should prefer numa_node_id()/cpu_to_node() instead of
>> numa_mem_id()/cpu_to_mem() because the latter loses hardware topology
>> information as pointed out by Tejun:
>> A - B - X - C - D
>> Where X is the memless node. numa_mem_id() on X would return
>> either B or C, right? If B or C can't satisfy the allocation,
>> the allocator would fallback to A from B and D for C, both of
>> which aren't optimal. It should first fall back to C or B
>> respectively, which the allocator can't do anymoe because the
>> information is lost when the caller side performs numa_mem_id().
>
> Hi Liu,
>
> BTW, how is this A - B - X - C - D problem solved ?
> I don't quite follow this.
>
> I cannot tell the difference between numa_node_id()/cpu_to_node() and
> numa_mem_id()/cpu_to_mem() on this point. Even with hardware topology
> info, how could it avoid this problem ?
>
> Isn't it still possible falling back to A from B and D for C ?
Hi Chen,
For the imagined topology, A<->B<->X<->C<->D, where A, B, C, D has
memory and X is memoryless.
Possible fallback lists are:
B: [ B, A, C, D]
X: [ B, C, A, D]
C: [ C, D, B, A]

cpu_to_mem(X) will either return B or C. Let's assume it returns B.
Then we will use "B: [ B, A, C, D]" to allocate memory for X, which
is not the optimal fallback list for X. And cpu_to_node(X) returns
X, and "X: [ B, C, A, D]" is the optimal fallback list for X.
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/