Re: [PATCH] compiler: enable CONFIG_OPTIMIZE_INLINING forcibly

From: Russell King - ARM Linux admin
Date: Tue Oct 01 2019 - 17:26:24 EST


On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote:
> > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin
> > <linux@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:
> > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin
> > > > <linux@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:
> > > > > > I apologize; I don't mean to be difficult. I would just like to avoid
> > > > > > surprises when code written with the assumption that it will be
> > > > > > inlined is not. It sounds like we found one issue in arm32 and one in
> > > > > > arm64 related to outlining. If we fix those two cases, I think we're
> > > > > > close to proceeding with Masahiro's cleanup, which I view as a good
> > > > > > thing for the health of the Linux kernel codebase.
> > > > >
> > > > > Except, using the C preprocessor for this turns the arm32 code into
> > > > > yuck:
> > > > >
> > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line
> > > > > preprocessor macro definitions, using the GCC ({ }) extension
> > > > > so that get_domain() can return a value.
> > > > >
> > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to
> > > > > become preprocessor macro definitions too.
> > > > >
> > > > > So, we end up with multiple levels of nested preprocessor macros.
> > > > > When something goes wrong, the compiler warning/error message is
> > > > > going to be utterly _horrid_.
> > > >
> > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline
> > > > asm not to make use of caller saved registers before calling a
> > > > function that might not be inlined.
> > >
> > > ... which I objected to based on the fact that this uaccess stuff is
> > > supposed to add protection against the kernel being fooled into
> > > accessing userspace when it shouldn't. The whole intention there is
> > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close
> > > as possible to the call site of the accessor touching userspace.
> >
> > Then use the C preprocessor to force the inlining. I'm sorry it's not
> > as pretty as static inline functions.
> >
> > >
> > > Moving it before the assignments mean that the compiler is then free
> > > to issue memory loads/stores to load up those registers, which is
> > > exactly what we want to avoid.
> > >
> > >
> > > In any case, I violently disagree with the idea that stuff we have
> > > in header files should be permitted not to be inlined because we
> > > have soo much that is marked inline.
> >
> > So there's a very important subtly here. There's:
> > 1. code that adds `inline` cause "oh maybe it would be nice to inline
> > this, but if it isn't no big deal"
> > 2. code that if not inlined is somehow not correct.
> > 3. avoid ODR violations via `static inline`
> >
> > I'll posit that "we have soo much that is marked inline [is
> > predominantly case 1 or 3, not case 2]." Case 2 is a code smell, and
> > requires extra scrutiny.
> >
> > > Having it moved out of line,
> > > and essentially the same function code appearing in multiple C files
> > > is really not an improvement over the current situation with excessive
> > > use of inlining. Anyone who has looked at the code resulting from
> > > dma_map_single() will know exactly what I'm talking about, which is
> > > way in excess of the few instructions we have for the uaccess_* stuff
> > > here.
> > >
> > > The right approach is to move stuff out of line - and by that, I
> > > mean _actually_ move the damn code, so that different compilation
> > > units can use the same instructions, and thereby gain from the
> > > whole point of an instruction cache.
> >
> > And be marked __attribute__((noinline)), otherwise might be inlined via LTO.
> >
> > >
> > > The whole "let's make inline not really mean inline" is nothing more
> > > than a band-aid to the overuse (and abuse) of "inline".
> >
> > Let's triple check the ISO C11 draft spec just to be sure:
> > § 6.7.4.6: A function declared with an inline function specifier is an
> > inline function. Making a
> > function an inline function suggests that calls to the function be as
> > fast as possible.
> > The extent to which such suggestions are effective is
> > implementation-defined. 139)
> > 139) For example, an implementation might never perform inline
> > substitution, or might only perform inline
> > substitutions to calls in the scope of an inline declaration.
> > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions
> > made by using the inline function specifier are effective (6.7.4).
> >
> > My translation:
> > "Please don't assume inline means anything."
> >
> > For the unspecified GNU C extension __attribute__((always_inline)), it
> > seems to me like it's meant more for performing inlining (an
> > optimization) at -O0. Whether the compiler warns or not seems like a
> > nice side effect, but provides no strong guarantee otherwise.
> >
> > I'm sorry that so much code may have been written with that
> > assumption, and I'm sorry to be the bearer of bad news, but this isn't
> > a recent change. If code was written under false assumptions, it
> > should be rewritten. Sorry.
>
> You may quote C11, but that is not relevent. The kernel is coded to
> gnu89 standard - see the -std=gnu89 flag.

There's more to this and why C11 is entirely irrelevant. The "inline"
you see in our headers is not the compiler keyword that you find in
various C standards, it is a macro that gets expanded to either:

#define inline inline __attribute__((__always_inline__)) __gnu_inline \
__maybe_unused notrace

or

#define inline inline __gnu_inline \
__maybe_unused notrace

__gnu_inline is defined as:

#define __gnu_inline __attribute__((__gnu_inline__))

So this attaches the gnu_inline attribute to the function:

`gnu_inline'
This attribute should be used with a function that is also declared
with the `inline' keyword. It directs GCC to treat the function
as if it were defined in gnu90 mode even when compiling in C99 or
gnu99 mode.
...
Since ISO C99 specifies a different semantics for `inline', this
function attribute is provided as a transition measure and as a
useful feature in its own right. This attribute is available in
GCC 4.1.3 and later. It is available if either of the
preprocessor macros `__GNUC_GNU_INLINE__' or
`__GNUC_STDC_INLINE__' are defined. *Note An Inline Function is
As Fast As a Macro: Inline.

which is quite clear that C99 semantics do not apply to _this_ inline.
The manual goes on to explain:

GCC implements three different semantics of declaring a function
inline. One is available with `-std=gnu89' or `-fgnu89-inline' or when
`gnu_inline' attribute is present on all inline declarations, another
when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without
`-fgnu89-inline'), and the third is used when compiling C++.

I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build
the kernel in gnu89 mode, that is the inlining definition that is
appropriate.

When it comes to __always_inline, the GCC manual is the definitive
reference, since we use the GCC attribute for that:

#define __always_inline inline __attribute__((__always_inline__))

and I've already quoted what the GCC manual says for always_inline.

Arguing about what the C11 spec says about inlining when we aren't
using C11 dialect in the kernel, but are using GCC features, does
not move the discussion on.

Thanks anyway, maybe it will become relevent in the future if we
decide to move to C11.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up