Re: [PATCH] include/asm-i386/semaphore.h optimizations

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Tue, 27 Jul 1999 18:58:35 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Dan Kegel: "Why no 8-CPU TPC benchmarks for Linux? (Was: Re: ENT Article Archive"
Previous message: Ulf Jaenicke-Roessler: "knfs, 2.2.10-ac12 oops"

Artur Skawina wrote:
> just for fun, and to see how much it matters on modern cpus
> i benchmarked
> asm("call 2f; jmp 1f; nop\n2: ret\n1:" );
> vs
> asm("pushl $1f; jmp 2f; nop\n2: ret\n1:" );
> The latter is up to six times slower here. [1]
> For more normal code the difference is obviously
> much smaller, but the effects on the btbs are probably
> still there.

This is because of the RSB (Return Stack Buffer) introduced with the
P5/MMX time. It predicts return addresses, but doing a push/ret
sequence can totally mess up the prediction stack.

> o changes "inc 0(ecx)" to "inc (ecx)". gas does not change the
> addressing mode in this case - why waste a full byte of icache
> for every semaphore op?.. :)

Ok.

> o makes the contention case use 'conventional' calling style.

Perhaps this change only makes sense if the out-of-line case ever
returns without scheduling. I'm not sure if it does (but I can think of
a few tricks where that would be useful). If it always schedules the
RSB mispredication overhead is still there and you've added a bit more
overhead. Though for some cases of semaphore contention I guess the RSB
could be right despite scheduling.

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dan Kegel: "Why no 8-CPU TPC benchmarks for Linux? (Was: Re: ENT Article Archive"
Previous message: Ulf Jaenicke-Roessler: "knfs, 2.2.10-ac12 oops"