Re: 2.2.0 and egcs 1.1 was Re: Sorry, wrong gcc-version

Andi Kleen (ak@muc.de)
Mon, 26 Oct 1998 10:20:24 +0100


On Mon, Oct 26, 1998 at 08:53:40AM +0100, kwrohrer@ce.mediaone.net wrote:
> The code I've noticed uses macros when it wants stuff inlined, and uses
> explicit inlining only as a way to write very long functions as multiple
> functions (for readability) yet avoid the function call overhead. I
> haven't seen explicit inlining anywhere else, and I've seen some huge
> macros I'd give a compiler the option to not inline...but I didn't write
> the code and I haven't read the whole kernel.

The reason is that macros generate better code on gcc than inline functions
(sad but true).

>
> So, what makes it a win for all architectures to not suggest that the
> compiler inline functions when it believes it would be faster, just
> because a few functions ought not to be inlined and a few functions
> which probably should be inlined are marked as such? I'd say tell the
> compiler to inline and tell it what to not inline. That goes double for
> loop unrolling; to mangle a metaphor, what's good for the goose is not
> always good for the duck or the swan.

It works differently. If you specify -finline-functions the compiler inlines
all functions it is able to inline (and that do not exceed a specific size).
It has no way to know and doesn't care whether it is a good idea to inline it,
because it doesn't know how often the code is executed. Other more advanced
compilers use a technique called 'profile feedback' to solve that dilemma (you
profile the program first and feed the data back to the compiler), but
that currently doesn't work in gcc[1]

If you don't believe me see yourself - in gcc/egcs source
integrate.c:function_can_inline_p()

The solution is to not use -finline-functions or -O3, and to specify inline
attributes explicitely for fast path functions. In Linux that is done.

In gcc there is another problem: the x86 architecture really doesn't have
enough registers. In gcc a variable can only get an register for its complete
live time[2]. Now when an inline function is included that needs lots of
registers that forces other more important variables out of registers and onto
the stack - even when the actual inline function code is called only rarely.
In this case a function call is a lot cheaper because it allows both functions
to have better register allocation (and it doesn't bloat the L1 cache foot
print)

> > compiler has to manage lots of variables in only a few registers, and both
> > loop unrolling and automatic inlining eat registers a lot.
>
> If that's a problem, then the compiler should balance the spillage against
> the potential gain (and the profiling data or hints, if any) and do a
> reasonable approximation of the Right Thing(tm).

It should, but it doesn't currently. Linux has to work with existing gcc
versions, not hypothetic ones. Actually it is another reason to use gcc 2.7.2
over egcs, because 2.7.2 on x86 seems to be better balanced in this regard.

-Andi

[1] Some code for it is here in egcs, but it is not functional and the
compiler passes don't use the information yet.
[2] That is simplified, but basically true.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/