Re: [PATCH v2] lib/string.c: Use freestanding environment

From: Arvind Sankar
Date: Wed Aug 19 2020 - 15:06:31 EST


On Wed, Aug 19, 2020 at 11:35:11AM -0700, Nick Desaulniers wrote:
> On Wed, Aug 19, 2020 at 7:08 AM Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote:
> >
> > gcc can transform the loop in a naive implementation of memset/memcpy
> > etc into a call to the function itself. This optimization is enabled by
> > -ftree-loop-distribute-patterns.
> >
> > This has been the case for a while (see eg [0]), but gcc-10.x enables
> > this option at -O2 rather than -O3 as in previous versions.
> >
> > Add -ffreestanding, which implicitly disables this optimization with
> > gcc. It is unclear whether clang performs such optimizations, but
> > hopefully it will also not do so in a freestanding environment.
> >
> > [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> >
> > Signed-off-by: Arvind Sankar <nivedita@xxxxxxxxxxxx>
>
> For Clang:
> For x86_64 defconfig:
> This results in no change for the code generated.
>
> For aarch64 defconfig:
> This results in calls to bcmp being replaced with calls to memcmp in
> strstr and strnstr. I plan on adding -fno-built-bcmp then removing
> bcmp anyways. Not a bug either way, just noting the difference is
> disassembly.
>
> For arm defconfig:
> This results in no change for the code generated.
>
> I should check the other architectures we support, but my local build
> doesn't have all backends enabled currently; we'll catch it once it's
> being testing in -next if it's an issue, but I don't foresee it
> (knocks on wood, famous last words, ...)
>
> If it helps GCC not optimize these core functions into infinite
> recursion, I'm for that, especially since I'd bet these get called
> frequently and early on in boot, which is my least favorite time to
> debug.
>
> Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
> Tested-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
>

I verified that arch/c6x with gcc-10 downloaded from kernel.org has the
broken memset with CC_OPTIMIZE_FOR_PERFORMANCE and gets fixed with this
patch. The default is optimize for size though, which doesn't seem to be
busted.

Thanks.