low performance of kernel build with quite a few simultaneous jobs

From: sat
Date: Thu Aug 23 2018 - 21:08:56 EST


Hi,

I found that kernel build speeds are very low with quite a few `n` for `make -j<n>`, from
one to the number of logical CPUs(LCPUs) on my EPYC server.

The speeds of `make -j<n>` look fine with `n` roughly from 1 to 24, and from 64 to 96,
but are slow with other `n`s (please refer to the attached images). I suspect that there is
a task scheduling problem about CPUs with complicated topology like EPYC . Plus, it can't be
explained only by the node distance problem because this behavior doesn't disappear with
enabling memory interleave.

Could you guess any reason of this strange behavior? I ask you because you wrote the patch for
better scheduling of recent AMD CPUs, commit 051f3ca02e46432c0965e8948f00c07d8a2f09c0
("sched/topology: Introduce NUMA identity node sched domain, ").

Here is the detailed information.

** Hardware Engironment

- server: Super Micro AS-1023US-TR4
- CPU: EPYC 7451 x 2 (48 cores、96 threads)
- RAM: MEM-DR432L-HL01-ER26 (32GB * 16, totally 512GB)

** Software Environment

- OS: Ubuntu 18.04
- kernel: v4.18.0
- build target kernel: v4.18.0

** Test load

Run the following commands.

```
make defconfig
for ((i=1:1<`# of online LCPUs``;i++)) ; do
make clean && ( time make -j$i ) >result.txt
done
```

I measured the above mentioned data for the following settings.

- all: 96 LCPUS: online whole LCPUs
- 1sock: 48 LCPUs: online only the LCPUS in one sockets
- 1die: 12 LCPUs: ... in one die
- 1ccx: 6 LCPUs: ... in one ccx

** result

Please refer to the attached files.

- 4.18.0-no-interleave.png: Setting no interleave on UEFI

node distances got by `numactl -H` is as follows.

```
node 0 1 2 3 4 5 6 7
0: 10 16 16 16 32 32 32 32
1: 16 10 16 16 32 32 32 32
2: 16 16 10 16 32 32 32 32
3: 16 16 16 10 32 32 32 32
4: 32 32 32 32 10 16 16 16
5: 32 32 32 32 16 10 16 16
6: 32 32 32 32 16 16 10 16
7: 32 32 32 32 16 16 16 10
```

- 4.18.0-hw-interleave.png: Setting interleave on UEFI.

node distance is all 10 in this case.

Here x axis is `n` for `make -j<n>` and y axis is "real" value of time command for
above mentioned kernel build over `n`.

In the attached images, all cases besides 1ccx looks strange.

Thanks,
Satoru Takeuchi

Attachment: 4.18.0-hw-interleave.png
Description: 4.18.0-hw-interleave.png

Attachment: 4.18.0-no-interleave.png
Description: 4.18.0-no-interleave.png