Re: [RESEND PATCH v5] x86, mem: move memmove to out of line assembler

From: Kees Cook
Date: Tue Nov 01 2022 - 18:37:30 EST


On Tue, Oct 18, 2022 at 01:31:34PM -0700, Nathan Chancellor wrote:
> On Tue, Oct 18, 2022 at 10:21:55AM -0700, Nick Desaulniers wrote:
> > When building ARCH=i386 with CONFIG_LTO_CLANG_FULL=y, it's possible
> > (depending on additional configs which I have not been able to isolate)
> > to observe a failure during register allocation:
> >
> > error: inline assembly requires more registers than available
> >
> > when memmove is inlined into tcp_v4_fill_cb() or tcp_v6_fill_cb().
> >
> > memmove is quite large and probably shouldn't be inlined due to size
> > alone. A noinline function attribute would be the simplest fix, but
> > there's a few things that stand out with the current definition:
> >
> > In addition to having complex constraints that can't always be resolved,
> > the clobber list seems to be missing %bx. By using numbered operands
> > rather than symbolic operands, the constraints are quite obnoxious to
> > refactor.
> >
> > Having a large function be 99% inline asm is a code smell that this
> > function should simply be written in stand-alone out-of-line assembler.
> >
> > Moving this to out of line assembler guarantees that the
> > compiler cannot inline calls to memmove.
> >
> > This has been done previously for 64b:
> > commit 9599ec0471de ("x86-64, mem: Convert memmove() to assembly file
> > and fix return value bug")
> >
> > That gives the opportunity for other cleanups like fixing the
> > inconsistent use of tabs vs spaces and instruction suffixes, and the
> > label 3 appearing twice. Symbolic operands, local labels, and
> > additional comments would provide this code with a fresh coat of paint.
> >
> > Finally, add a test that tickles the `rep movsl` implementation to test
> > it for correctness, since it has implicit operands.
> >
> > Suggested-by: Ingo Molnar <mingo@xxxxxxxxxx>
> > Suggested-by: David Laight <David.Laight@xxxxxxxxxx>
> > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > Tested-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
>
> I ran
>
> $ tools/testing/kunit/kunit.py run --arch i386 --cross_compile x86_64-linux- memcpy
>
> with GCC 6 through 12 from
> https://mirrors.edge.kernel.org/pub/tools/crosstool/ (my GCC 5 container
> is based on Ubuntu Xenial, which does not have a new enough Python for
> kunit.py) and
>
> $ tools/testing/kunit/kunit.py run --arch i386 --make_options LLVM=1 memcpy
>
> with LLVM 11 through 16 from Debian with this change on top of Kees's
> expanding of the memcpy() KUnit tests [1] and everything passed.
>
> Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx>

Can an x86 maintainer please pick this up for -tip?

Thanks!

-Kees

--
Kees Cook