Re: [PATCH] kbuild: move -pipe to global KBUILD_CFLAGS

From: Nathan Chancellor
Date: Sat Feb 22 2020 - 03:02:14 EST


On Fri, Feb 21, 2020 at 11:01:24PM -0500, Alex Xu (Hello71) wrote:
> Excerpts from Nathan Chancellor's message of February 21, 2020 9:16 pm:
> > Hi Alex,
> >
> > On Fri, Feb 21, 2020 at 07:38:20PM -0500, Alex Xu (Hello71) wrote:
> >> -pipe reduces unnecessary disk wear for systems where /tmp is not a
> >> tmpfs, slightly increases compilation speed, and avoids leaving behind
> >> files when gcc crashes.
> >>
> >> According to the gcc manual, "this fails to work on some systems where
> >> the assembler is unable to read from a pipe; but the GNU assembler has
> >> no trouble". We already require GNU ld on all platforms, so this is not
> >> an additional dependency. LLVM as also supports pipes.
> >>
> >> -pipe has always been used for most architectures, this change
> >> standardizes it globally. Most notably, arm, arm64, riscv, and x86 are
> >> affected.
> >>
> >> Signed-off-by: Alex Xu (Hello71) <alex_y_xu@xxxxxxxx>
> >
> > Do you have any numbers to show this is actually beneficial from a
> > compilation time perspective? I ask because I saw an improvement in
> > compilation time when removing -pipe from x86's KBUILD_CFLAGS in
> > commit 437e88ab8f9e ("x86/build: Remove -pipe from KBUILD_CFLAGS").
> >
> > For what it's worth, clang ignores -pipe so this does not actually
> > matter for its integrated assembler.
> >
> > That type of change could have been a fluke but I guarantee people
> > will care more about any change in compilation time than any of the
> > other things that you mention so it might be wise to check on major
> > architectures to make sure that it doesn't hurt.
> >
> > Cheers,
> > Nathan
> >
>
> Sorry, I should've checked the performance first. I have now run:
>
> cd /tmp/linux # previously: make O=/tmp/linux
> export MAKEFLAGS=12 # Ryzen 1600, 6 cores, 12 threads
> make allnoconfig
> for i in {1..10}; do
> make clean >/dev/null
> time make XPIPE=-pipe >/dev/null
> make clean >/dev/null
> time make >/dev/null
> done
>
> after patching -pipe to $(XPIPE) in Makefile.
>
> Results (without ld warnings):
>
> make > /dev/null 130.54s user 10.41s system 969% cpu 14.537 total
> make XPIPE=-pipe > /dev/null 129.83s user 9.95s system 977% cpu 14.296 total
> make > /dev/null 129.73s user 10.28s system 966% cpu 14.493 total
> make XPIPE=-pipe > /dev/null 130.04s user 10.63s system 986% cpu 14.252 total
> make > /dev/null 129.53s user 10.28s system 972% cpu 14.379 total
> make XPIPE=-pipe > /dev/null 130.29s user 10.17s system 983% cpu 14.288 total
> make > /dev/null 130.19s user 10.52s system 968% cpu 14.530 total
> make XPIPE=-pipe > /dev/null 129.90s user 10.47s system 978% cpu 14.343 total
> make > /dev/null 129.50s user 10.81s system 959% cpu 14.620 total
> make XPIPE=-pipe > /dev/null 130.37s user 10.60s system 975% cpu 14.446 total
> make > /dev/null 129.63s user 10.18s system 972% cpu 14.374 total
> make XPIPE=-pipe > /dev/null 131.29s user 9.92s system 1016% cpu 13.899 total
> make > /dev/null 129.96s user 10.39s system 961% cpu 14.596 total
> make XPIPE=-pipe > /dev/null 131.63s user 10.16s system 1011% cpu 14.015 total
> make > /dev/null 129.33s user 10.54s system 970% cpu 14.405 total
> make XPIPE=-pipe > /dev/null 129.70s user 10.40s system 976% cpu 14.349 total
> make > /dev/null 129.53s user 10.25s system 964% cpu 14.494 total
> make XPIPE=-pipe > /dev/null 130.38s user 10.62s system 973% cpu 14.479 total
> make > /dev/null 130.73s user 10.08s system 957% cpu 14.704 total
> make XPIPE=-pipe > /dev/null 130.43s user 10.62s system 985% cpu 14.309 total
> make > /dev/null 130.54s user 10.41s system 969% cpu 14.537 total
>
> There is a fair bit of variance, probably due to cpufreq, schedutil, CPU
> temperature, CPU scheduler, motherboard power delivery, etc. But, I
> think it can be clearly seen that -pipe is, on average, about 0.1 to 0.2
> seconds faster.
>
> I also tried "make defconfig":
>
> make > /dev/null 1238.26s user 102.39s system 1095% cpu 2:02.33 total
> make XPIPE=-pipe > /dev/null 1231.33s user 102.52s system 1081% cpu 2:03.29 total
> make > /dev/null 1232.92s user 102.07s system 1096% cpu 2:01.71 total
> make XPIPE=-pipe > /dev/null 1239.59s user 102.30s system 1096% cpu 2:02.39 total
> make > /dev/null 1229.81s user 101.72s system 1093% cpu 2:01.74 total
> make XPIPE=-pipe > /dev/null 1234.64s user 101.30s system 1098% cpu 2:01.64 total
> make > /dev/null 1228.50s user 104.39s system 1093% cpu 2:01.91 total
> make XPIPE=-pipe > /dev/null 1238.78s user 102.57s system 1099% cpu 2:01.99 total
> make > /dev/null 1238.26s user 102.39s system 1095% cpu 2:02.33 total
>
> I stopped after this because I needed to use the machine for other
> tasks. The results are less clear, but I think there's not a big
> difference one way or another, at least on my machine.
>
> CPU: Ryzen 1600, overclocked to ~3.8 GHz
> RAM: Corsair Vengeance, overclocked to ~3300 MHz, forgot timings
> Motherboard: ASRock B450 Pro4
>
> I would speculate that the recent pipe changes have caused a change in
> the relative speed compared to 2018. I am using 5.6.0-rc2 with -O3
> -march=native patches.
>
> Regards,
> Alex.

I used hyperfine [1] to run a quick benchmark with a freshly built
GCC 9.2.0 for x86 and aarch64 and here are the results:

$ hyperfine -w 1 -r 25 \
-p 'rm -rf out.x86_64' \
'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- O=out.x86_64 defconfig all' \
'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- KCFLAGS=-pipe O=out.x86_64 defconfig all'

Benchmark #1: make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- O=out.x86_64 defconfig all
Time (mean  Ï): 68.535 s  0.275 s [User: 2241.681 s, System: 185.454 s]
Range (min â max): 67.855 s â 68.953 s 25 runs

Benchmark #2: make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- KCFLAGS=-pipe O=out.x86_64 defconfig all
Time (mean  Ï): 68.922 s  0.095 s [User: 2264.168 s, System: 190.297 s]
Range (min â max): 68.781 s â 69.126 s 25 runs

Summary
'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- O=out.x86_64 defconfig all' ran
1.01 Â 0.00 times faster than 'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- KCFLAGS=-pipe O=out.x86_64 defconfig all'

$ hyperfine -w 1 -r 25 \
-p 'rm -rf out.aarch64' \
'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- O=out.aarch64 defconfig all' \
'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- KCFLAGS=-pipe O=out.aarch64 defconfig all'

Benchmark #1: make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- O=out.aarch64 defconfig all
Time (mean  Ï): 166.732 s  0.594 s [User: 5654.780 s, System: 475.493 s]
Range (min â max): 165.873 s â 167.859 s 25 runs

Benchmark #2: make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- KCFLAGS=-pipe O=out.aarch64 defconfig all
Time (mean  Ï): 168.047 s  0.428 s [User: 5734.031 s, System: 488.392 s]
Range (min â max): 167.328 s â 168.959 s 25 runs

Summary
'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- O=out.aarch64 defconfig all' ran
1.01 Â 0.00 times faster than 'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- KCFLAGS=-pipe O=out.aarch64 defconfig all'

In both cases it seems like performance regresses (by 1% but still) but
maybe it is my machine, even though this benchmark was done on a
different machine than the one from my commit back in 2018.

I am not sure I would write off these results, since I did the benchmark
25 times on each one back to back, eliminating most of the variance that
you described.

[1]: https://github.com/sharkdp/hyperfine

Cheers,
Nathan