Re: [PATCH -tip v3 09/11] data_race: Avoid nested statement expression

From: Arnd Bergmann
Date: Tue May 26 2020 - 06:42:37 EST


On Thu, May 21, 2020 at 10:21 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
>
> On Thu, May 21, 2020 at 7:22 AM 'Marco Elver' via Clang Built Linux
> <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
> >
> > It appears that compilers have trouble with nested statement
> > expressions. Therefore remove one level of statement expression nesting
> > from the data_race() macro. This will help us avoid potential problems
> > in future as its usage increases.
> >
> > Link: https://lkml.kernel.org/r/20200520221712.GA21166@xxxxxxx
> > Acked-by: Will Deacon <will@xxxxxxxxxx>
> > Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
>
> Thanks Marco, I can confirm this series fixes the significant build
> time regressions.
>
> Tested-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
>
> More measurements in: https://github.com/ClangBuiltLinux/linux/issues/1032
>
> Might want:
> Reported-by: Borislav Petkov <bp@xxxxxxx>
> Reported-by: Nathan Chancellor <natechancellor@xxxxxxxxx>
> too.

I find this patch only solves half the problem: it's much faster than
without the
patch, but still much slower than the current mainline version. As far as I'm
concerned, I think the build speed regression compared to mainline is not yet
acceptable, and we should try harder.

I have not looked too deeply at it yet, but this is what I found from looking
at a file in a randconfig build:

Configuration: see https://pastebin.com/raw/R9erCwNj

== Current linux-next ==
with "data_race: Avoid nested statement expression"
and "compiler.h: Remove data_race() and unnecessary checks from
{READ,WRITE}_ONCE()"

$ touch fs/ocfs2/journal.c ; cp
../arch/x86/configs/0xFFA843AA_defconfig obj-x86/.config ; perf stat
make olddefconfig O=obj-x86/ CC=clang-11 fs/ocfs2/journal.i ARCH=x86
-skj30 ; wc obj-x86/fs/ocfs2/journal.i
48741 552950 9010050 obj-x86/fs/ocfs2/journal.i
real 0m12.514s
user 0m10.270s
sys 0m2.668s

== Same tree, without those two ==

$ touch fs/ocfs2/journal.c cp ../arch/x86/configs/0xFFA843AA_defconfig
obj-x86/.config ; time make olddefconfig O=obj-x86/ CC=clang-11
fs/ocfs2/journal.i ARCH=x86 -skj30 ; wc obj-x86/fs/ocfs2/journal.i
real 1m35.968s
user 1m33.579s
sys 0m3.523s
48741 1926607 36542560 obj-x86/fs/ocfs2/journal.i

== Mainline Linux ==

$ touch fs/ocfs2/journal.c ; cp
../arch/x86/configs/0xFFA843AA_defconfig obj-x86/.config ; time make
olddefconfig O=obj-x86/ CC=clang-11 fs/ocfs2/journal.i ARCH=x86 -skj30
; wc obj-x86/fs/ocfs2/journal.i

real 0m6.529s
user 0m4.389s
sys 0m2.561s
47377 377887 4178633 obj-x86/fs/ocfs2/journal.i

So both the size of the preprocessed file and the time to preprocess it are
still twice as bad for linux-next compared to mainline. Actually compiling the
preprocessed filed is very quick, as I guess only the preprocessing seems to
use all the time.

Arnd