Re: [patch 0/2] Immediate Values - jump patching update

From: Ingo Molnar
Date: Mon Apr 28 2008 - 18:11:58 EST



* H. Peter Anvin <hpa@xxxxxxxxx> wrote:

>>> I still think this is the completely wrong approach.
>>
>> hm, can it result in a broken kernel? If yes, how? Or are your
>> objections more higher level?
>
> My objections are higher level, I believe the current code is (a)
> painfully complex, and I'd rather not see it in the kernel, and (b)
> the wrong thing anyway.
>
> Put a 5-byte nop in as the marker, and patch it with a call
> instruction, out of line, to a collector function.

the counter argument was that by specific sched.o analysis, this results
in slower code. The reason is that the "function call parameter
preparation" halo around that 5-byte patch site is larger than that
single conditional branch operation to an offline place of the current
function is.

i.e. the current optimized marker approach does roughly this:

[ .... fastpath head .... ]
[ immediate value instruction ] --->
[ branch instruction ] ---> these two get NOP-ed out
[ .... fastpath tail .... ]
[ ............................. ]
[ ... offline area ............ ]
[ ... parameter preparation ... ]
[ ... marker call ............. ]

your proposed 5-byte call NOP approach (which btw. was what i proposed
multiple times in the past 2 years) would do this:

[ .... fastpath head ...... ]
[ ... parameter preparation ... ]
[ .... 5-byte CALL .......... ] ---> NOP-ed out
[ .... fastpath tail .......... ]
[ ............................. ]

in the first case we have little "marker parameter/value preparation"
cost: it all happens in the 'offline area' _by GCC_. I.e. the fastpath
is relatively undisturbed.

in the latter case, all the 'parameter preparation' phase has to happen
at around the 5-byte CALL site, in the fastpath. This, in the specific,
assembly level analysis of sched.o, was shown by Matthieu to be a
pessimisation. We are better off by inserting that conditional and
letting gcc generate the call, than by forcing it in the middle of the
fastpath - even if we end up NOP-ing out the call.

wrt. complexity i agree with you - if the current optimization cannot be
made correctly we have to fall back to a simpler variant, even if it's
slower.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/