Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

From: Dietmar Eggemann
Date: Fri Jan 22 2021 - 05:12:53 EST


On 21/01/2021 22:17, Song Bao Hua (Barry Song) wrote:
>
>
>> -----Original Message-----
>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
>> Sent: Friday, January 22, 2021 7:54 AM
>> To: Valentin Schneider <valentin.schneider@xxxxxxx>; Meelis Roos
>> <mroos@xxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Vincent Guittot
>> <vincent.guittot@xxxxxxxxxx>; Song Bao Hua (Barry Song)
>> <song.bao.hua@xxxxxxxxxxxxx>; Mel Gorman <mgorman@xxxxxxx>
>> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
>>
>> On 21/01/2021 19:21, Valentin Schneider wrote:
>>> On 21/01/21 19:39, Meelis Roos wrote:

[...]

>> # cat /sys/devices/system/node/node*/distance
>> 10 12 12 14 14 14 14 16
>> 12 10 14 12 14 14 12 14
>> 12 14 10 14 12 12 14 14
>> 14 12 14 10 12 12 14 14
>> 14 14 12 12 10 14 12 14
>> 14 14 12 12 14 10 14 12
>> 14 12 14 14 12 14 10 12
>> 16 14 14 14 14 12 12 10
>>
>> The '16' seems to be the culprit. How does such a topo look like?

Maybe like this:

_________
| |
.-6 0 4-.
| \ / \ / |
| 1 2 |
| \ \ |
--7 3----5 |
| |____|_|
|_______|

>
> Once we get a topology like this:
>
>
> +------+ +------+ +-------+ +------+
> | node | |node | | node | |node |
> | +---------+ +--------+ +-------+ |
> +------+ +------+ +-------+ +------+
>
> We can reproduce this issue.
> For example, every cpu with the below numa_distance can have
> "groups don't span domain->span":
> node 0 1 2 3
> 0: 10 12 20 22
> 1: 12 10 22 24
> 2: 20 22 10 12
> 3: 22 24 12 10
2 20 2
So this should look like: 1 --- 0 ---- 2 --- 3