Re: [patch] measurements, numbers about CONFIG_OPTIMIZE_INLINING=yimpact

From: Linus Torvalds
Date: Fri Jan 09 2009 - 14:45:34 EST




On Fri, 9 Jan 2009, Richard Guenther wrote:
>
> -fno-inline-functions-called-once disables the heuristic that always
> inlines (static!) functions that are called once. Other heuristics
> still apply, like inlining the static function if it is small.
> Everything else would be totally stupid - which seems to be the "default
> mode" you think GCC developers are in.

Well, I don't know about you, but the "don't inline a single instruction"
sounds a bit stupid to me. And yes, that's exactly what triggered this
whole thing.

We have two examples of gcc doing that, one of which was even a modern
version of gcc, where we had sone absolutely _everything_ on a source
level to make sure that gcc could not possibly screw up. Yet it did:

static inline int constant_test_bit(int nr, const volatile unsigned long *addr)
{
return ((1UL << (nr % BITS_PER_LONG)) &
(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
}

#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))

in this case, Ingo said that changing that _single_ inline to forcing
inlining made a difference.

That's CRAZY. The thing isn't even called unless "nr" is constant, so
absolutely _everything_ optimizes away, and that whole function was
designed to give us a single instruction:

testl $constant,constant_offset(addr)

and nothing else.

Maybe there was something else going on, and maybe Ingo's tests were off,
but this is an example of gcc not inlining WHEN WE TOLD IT TO, and when
the function was a single instruction.

How can anybody possibly not consider that to be "stupid"?

The other case (with a single "cmpxchg" inline asm instruction) was at
least _slightly_ more understandable, in that (a) Ingo claims modern gcc's
did inline it and (b) the original function actually has a "switch()"
statement that depends on the argument that is constant, so a stupid
inliner might believe that it's a big function. But again, we _told_ the
compiler to inline the damn thing, because we knew better. But gcc didn't.

The other part that is crazy is when gcc inlines large functions that
aren't even called most of the time (the "ioctl()" switch statements tend
to be a great example of this - gcc inlines ten or twenty functions, and
we can guarantee that only one of them is ever called). Yes, maybe it
makes the code smaller, but it makes the code also undebuggable and often
BUGGY, because we now have the stack frame of all ten-to-twenty functions
to contend with.

And notice how "static" has absolutely _zero_ meaning for the above
example. Yes, the thing is called just from one place - that's how
something like that very much works. It's a special case. It's not _worth_
inlining, especially if it causes bugs. So "called once" or "static" is
actually totally irrelevant.

And no, they are not marked "inline" (although they are clearly also not
marked "uninline", until we figure out that gcc is causing system crashes,
and we add the thing).

If these two small problems were fixed, gcc inlining would work much
better. But the first one, in particular, means that the "do I inline or
not" decision would have to happen after expanding and simplifying
constants. And then, if the end result is big, the inlining gets aborted.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/