Re: Semaphore assembly-code bug

From: dean gaudet
Date: Fri Oct 29 2004 - 18:58:56 EST




On Fri, 29 Oct 2004, Andreas Steinmetz wrote:

> > On Fri, 29 Oct 2004, Andreas Steinmetz wrote:
> > > Sample quote from said manual (P/N 248966-05):
> > > "Use the lea instruction and the full range of addressing modes to do
> > > address calculation"
...
> Some more data from said manual (lea is better on P3 and the same as add on
> P4):

you really need to understand intel optimisation guides. it helps to diff
them over time to see the types of things that go in and out of fashion.

> I don't know about P4 internals but let me make some guess:
> There's lot of software around that needs to run on older processors where lea
> has quite some performance advantage. Thus I would guess that the P4 design
> respects this by handling lea x(esp),esp efficiently.

your guess is generally wrong... try measuring it.

for p4 model 0 through 2 it was faster to avoid lea and shl and generate
code like:

add %ebx,%ebx
add %ebx,%ebx
add %ebx,%ebx
add %ebx,%ebx

which would complete in 2 cycles, compared to 4 cycles for lea or a shift.
but that crap doesn't apply to any other x86 (except efficeon which
notices this crud and converts it to its own optimal sequence).

p4 model 2 is probably way more common than p4 model 3 still.

you also need to be aware of k7/k8. AMD has their own optimisation guide
(i'm too lazy to find url/#). but the important point for lea and AMD is
that it is a 2 cycle latency operation, and add is 1 cycle.

but you know what? we can talk about what the optimization guides say
until we're blue... the only thing which matters is experience. go
measure it. (i've measured a bazillion things like this.)

use pop, don't use lea to modify esp.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/