Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

From: Valentin Schneider
Date: Thu Jan 21 2021 - 10:09:01 EST



(+Cc relevant folks)

Hi,

On 21/01/21 15:41, Meelis Roos wrote:
> This happens on Sun Fire X4600 M2 - 32 cores in 8 CPU slots. 5.10 was silent. Current git and
> 5.10.0-13256-g5814bc2d4cc2 exhibit this message in dmesg but otherwise seem to work fine
> (kernel compilation succeeds).
>

b5b217346de8 ("sched/topology: Warn when NUMA diameter > 2") was added in
5.11-rc1, and I believe was marked for stable.

It doesn't come with a scheduler behaviour change, it only catches
topologies that end up being silently (unless run with SCHED_DEBUG=y)
misrepresented / misinterpreted by the scheduler.

Up until now I had only seen it fire on a single, somewhat unusual
topology. As fixing it is far from trivial, I figured adding this warning
would let us build a case for actually fixing it if we get some more
reports.

Could you paste the output of the below?

$ cat /sys/devices/system/node/node*/distance

Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
appending 'sched_debug' to your cmdline should yield some extra data.