Re: [PATCH 0/2] x86: Remove ideal_nops[]

From: Sedat Dilek
Date: Sat Mar 13 2021 - 00:33:22 EST


On Fri, Mar 12, 2021 at 10:00 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Fri, Mar 12, 2021 at 12:32:53PM +0100, Peter Zijlstra wrote:
> > Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> > simply irrelevant today, remove variable NOPs and use NOPL.
>
> Just ran them on my SNB box:
>
> cpu family : 6
> model : 45
> model name : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
> stepping : 7
>
> with the usual perf stat kernel build workload with
> CONFIG_DYNAMIC_FTRACE and CONFIG_FUNCTION_TRACER where each function has
> a NOP at its beginning when ftrace is disabled (thx Steve).
>
> ./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 bzImage
>
> before: tip-master
>
> Performance counter stats for 'make -s -j9 bzImage' (5 runs):
>
> 3,213,728.10 msec task-clock # 7.307 CPUs utilized ( +- 0.01% )
> 339,270 context-switches # 0.106 K/sec ( +- 0.09% )
> 31,472 cpu-migrations # 0.010 K/sec ( +- 0.64% )
> 62,070,684 page-faults # 0.019 M/sec ( +- 0.01% )
> 11,498,198,009,323 cycles # 3.578 GHz ( +- 0.01% ) (83.33%)
> 8,235,957,366,696 stalled-cycles-frontend # 71.63% frontend cycles idle ( +- 0.01% ) (83.33%)
> 5,976,456,688,814 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.02% ) (66.67%)
> 7,553,156,344,376 instructions # 0.66 insn per cycle
> # 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
> 1,635,468,917,524 branches # 508.901 M/sec ( +- 0.00% ) (83.34%)
> 51,888,292,932 branch-misses # 3.17% of all branches ( +- 0.02% ) (83.33%)
>
> 439.809 +- 0.156 seconds time elapsed ( +- 0.04% )
>
>
> after: tip-master-nops
>
> Performance counter stats for 'make -s -j9 bzImage' (5 runs):
>
> 3,217,113.67 msec task-clock # 7.307 CPUs utilized ( +- 0.03% )
> 339,425 context-switches # 0.106 K/sec ( +- 0.20% )
> 31,724 cpu-migrations # 0.010 K/sec ( +- 0.54% )
> 62,027,130 page-faults # 0.019 M/sec ( +- 0.01% )
> 11,508,779,965,901 cycles # 3.577 GHz ( +- 0.03% ) (83.34%)
> 8,241,212,210,440 stalled-cycles-frontend # 71.61% frontend cycles idle ( +- 0.04% ) (83.33%)
> 5,982,615,533,177 stalled-cycles-backend # 51.98% backend cycles idle ( +- 0.06% ) (66.66%)
> 7,546,407,430,314 instructions # 0.66 insn per cycle
> # 1.09 stalled cycles per insn ( +- 0.00% ) (83.33%)
> 1,634,187,006,479 branches # 507.967 M/sec ( +- 0.00% ) (83.33%)
> 51,941,580,371 branch-misses # 3.18% of all branches ( +- 0.01% ) (83.33%)
>
> 440.266 +- 0.195 seconds time elapsed ( +- 0.04% )
>
>
> So here's numbers talk, bullshit walks. And with those numbers no
> bullshit can remain lingering around anyway.
>

Here are my numbers.

My CPU:

cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz
stepping : 7

My base was Linus Git:

$ git describe master
v5.12-rc2-338-gf78d76e72a46

I used Peter's patchset plus a required pre-patch so that it cleanly
applies against Linus Git:

x86/jump_label: Mark arguments as const to satisfy asm constraints
x86: Remove dynamic NOP selection
objtool,x86: Use asm/nops.h

My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
on Debian/testing AMD64.

Patchset applied for a first build:

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@xxxxxxxxx
KBUILD_BUILD_TIMESTAMP=2021-03-12 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-7~bullseye+dileks1':

55605704.79 msec task-clock # 3.568 CPUs
utilized
8317406 context-switches # 0.150 K/sec
261843 cpu-migrations # 0.005 K/sec
288312867 page-faults # 0.005 M/sec
107642573933061 cycles # 1.936 GHz
82531165255218 stalled-cycles-frontend # 76.67% frontend
cycles idle
64932777217096 stalled-cycles-backend # 60.32% backend
cycles idle
59591288273663 instructions # 0.55 insn per
cycle
# 1.38 stalled
cycles per insn
10906545460023 branches # 196.141 M/sec
489809039153 branch-misses # 4.49% of all
branches

15582.829443660 seconds time elapsed

53102.403996000 seconds user
2547.134916000 seconds sys

Building on a kernel where above patchset was applied and booted into
and rebuild with the same code-base:

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-8-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@xxxxxxxxx
KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-8~bullseye+dileks1':

56976758.12 msec task-clock # 3.589 CPUs
utilized
8334519 context-switches # 0.146 K/sec
269340 cpu-migrations # 0.005 K/sec
288451841 page-faults # 0.005 M/sec
110795226760909 cycles # 1.945 GHz
85643743105935 stalled-cycles-frontend # 77.30% frontend
cycles idle
68146424096780 stalled-cycles-backend # 61.51% backend
cycles idle
59559370217381 instructions # 0.54 insn per
cycle
# 1.44 stalled
cycles per insn
10902087911812 branches # 191.343 M/sec
490447660403 branch-misses # 4.50% of all
branches

15875.267204283 seconds time elapsed

54502.552543000 seconds user
2519.914516000 seconds sys

Simply comparing the build-times:
~15583 vs. ~15875 means approx. 5mins more build-time.

Attached are my linux-configs and above mentioned build-times (in case
Gmail has truncated them).

- Sedat -
Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@xxxxxxxxx KBUILD_BUILD_TIMESTAMP=2021-03-12 bindeb-pkg KDEB_PKGVERSION=5.12.0~rc2-7~bullseye+dileks1':

55605704.79 msec task-clock # 3.568 CPUs utilized
8317406 context-switches # 0.150 K/sec
261843 cpu-migrations # 0.005 K/sec
288312867 page-faults # 0.005 M/sec
107642573933061 cycles # 1.936 GHz
82531165255218 stalled-cycles-frontend # 76.67% frontend cycles idle
64932777217096 stalled-cycles-backend # 60.32% backend cycles idle
59591288273663 instructions # 0.55 insn per cycle
# 1.38 stalled cycles per insn
10906545460023 branches # 196.141 M/sec
489809039153 branch-misses # 4.49% of all branches

15582.829443660 seconds time elapsed

53102.403996000 seconds user
2547.134916000 seconds sys


Attachment: config-5.12.0-rc2-7-amd64-clang12-cfi
Description: Binary data

Attachment: config-5.12.0-rc2-8-amd64-clang12-cfi
Description: Binary data

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-8-amd64-clang12-cfi KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@xxxxxxxxx KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg KDEB_PKGVERSION=5.12.0~rc2-8~bullseye+dileks1':

56976758.12 msec task-clock # 3.589 CPUs utilized
8334519 context-switches # 0.146 K/sec
269340 cpu-migrations # 0.005 K/sec
288451841 page-faults # 0.005 M/sec
110795226760909 cycles # 1.945 GHz
85643743105935 stalled-cycles-frontend # 77.30% frontend cycles idle
68146424096780 stalled-cycles-backend # 61.51% backend cycles idle
59559370217381 instructions # 0.54 insn per cycle
# 1.44 stalled cycles per insn
10902087911812 branches # 191.343 M/sec
490447660403 branch-misses # 4.50% of all branches

15875.267204283 seconds time elapsed

54502.552543000 seconds user
2519.914516000 seconds sys