Re: [PATCH] include/asm-i386/semaphore.h optimizations

Artur Skawina (skawina@geocities.com)
Tue, 27 Jul 1999 20:18:41 +0200


Jamie Lokier wrote:
>
> > asm("call 2f; jmp 1f; nop\n2: ret\n1:" );
> > asm("pushl $1f; jmp 2f; nop\n2: ret\n1:" );
> > The latter is up to six times slower here. [1]
> > For more normal code the difference is obviously
> > much smaller, but the effects on the btbs are probably
> > still there.
>
> This is because of the RSB (Return Stack Buffer) introduced with the
> P5/MMX time. It predicts return addresses, but doing a push/ret
> sequence can totally mess up the prediction stack.

correct. (except that docs i have suggest this was introduced in the ppro)

> > o makes the contention case use 'conventional' calling style.
>
> Perhaps this change only makes sense if the out-of-line case ever
> returns without scheduling. I'm not sure if it does (but I can think of
> a few tricks where that would be useful). If it always schedules the
> RSB mispredication overhead is still there and you've added a bit more
> overhead. Though for some cases of semaphore contention I guess the RSB
> could be right despite scheduling.

hmm, interesting thoughts. I can't at the moment think of a way to
time the difference; i strongly suspect that the old way was not
better - mispredicted branches caused by an out of sync rsb would
probably outweight any benefits that one removed unconditional jump
could give. These funtions _can_ return without scheduling.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/