RE: 5.11-rc4+git: Shortest NUMA path spans too many nodes

From: Song Bao Hua (Barry Song)
Date: Fri Jan 22 2021 - 06:53:39 EST




> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
> Sent: Friday, January 22, 2021 11:05 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; Valentin Schneider
> <valentin.schneider@xxxxxxx>; Meelis Roos <mroos@xxxxxxxx>; LKML
> <linux-kernel@xxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Vincent Guittot
> <vincent.guittot@xxxxxxxxxx>; Mel Gorman <mgorman@xxxxxxx>
> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
>
> On 21/01/2021 22:17, Song Bao Hua (Barry Song) wrote:
> >
> >
> >> -----Original Message-----
> >> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
> >> Sent: Friday, January 22, 2021 7:54 AM
> >> To: Valentin Schneider <valentin.schneider@xxxxxxx>; Meelis Roos
> >> <mroos@xxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>
> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Vincent Guittot
> >> <vincent.guittot@xxxxxxxxxx>; Song Bao Hua (Barry Song)
> >> <song.bao.hua@xxxxxxxxxxxxx>; Mel Gorman <mgorman@xxxxxxx>
> >> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> >>
> >> On 21/01/2021 19:21, Valentin Schneider wrote:
> >>> On 21/01/21 19:39, Meelis Roos wrote:
>
> [...]
>
> >> # cat /sys/devices/system/node/node*/distance
> >> 10 12 12 14 14 14 14 16
> >> 12 10 14 12 14 14 12 14
> >> 12 14 10 14 12 12 14 14
> >> 14 12 14 10 12 12 14 14
> >> 14 14 12 12 10 14 12 14
> >> 14 14 12 12 14 10 14 12
> >> 14 12 14 14 12 14 10 12
> >> 16 14 14 14 14 12 12 10
> >>
> >> The '16' seems to be the culprit. How does such a topo look like?
>
> Maybe like this:
>
> _________
> | |
> .-6 0 4-.
> | \ / \ / |
> | 1 2 |
> | \ \ |
> --7 3----5 |
> | |____|_|
> |_______|
>
> >
> > Once we get a topology like this:
> >
> >
> > +------+ +------+ +-------+ +------+
> > | node | |node | | node | |node |
> > | +---------+ +--------+ +-------+ |
> > +------+ +------+ +-------+ +------+
> >
> > We can reproduce this issue.
> > For example, every cpu with the below numa_distance can have
> > "groups don't span domain->span":
> > node 0 1 2 3
> > 0: 10 12 20 22
> > 1: 12 10 22 24
> > 2: 20 22 10 12
> > 3: 22 24 12 10
> 2 20 2
> So this should look like: 1 --- 0 ---- 2 --- 3

Yes. So here we are facing another problem:
kernel/sched/topology.c has an assumption that:
node_distance(0,j) includes all distances in
node_distance(i,j).

void sched_init_numa(void)
{
...
*
* Assumes node_distance(0,j) includes all distances in
* node_distance(i,j) in order to avoid cubic time.
*/
next_distance = curr_distance;
for (i = 0; i < nr_node_ids; i++) {
for (j = 0; j < nr_node_ids; j++) {
for (k = 0; k < nr_node_ids; k++)
}

but obviously we are not this case. Right now, we are getting
some performance decrease due to this, probably I'll start another
thread for it.

Thanks
Barry