Re: Semaphore assembly-code bug

From: Linus Torvalds
Date: Sat Oct 30 2004 - 12:55:46 EST




On Sat, 30 Oct 2004, Denis Vlasenko wrote:
>
> On Saturday 30 October 2004 05:13, Andi Kleen wrote:
> > Linus Torvalds <torvalds@xxxxxxxx> writes:
> >
> > > Anyway, it's quite likely that for several CPU's the fastest sequence ends
> > > up actually being
> > >
> > > movl 4(%esp),%ecx
> > > movl 8(%esp),%edx
> > > movl 12(%esp),%eax
> > > addl $16,%esp
> > >
> > > which is also one of the biggest alternatives.
> >
> > For K8 it should be the fastest way. K7 probably too.
>
> Pity. I always loved 1 byte insns :)

I personally am a _huge_ believer in small code.

The sequence

popl %eax
popl %ecx
popl %edx
popl %eax

is four bytes. In contrast, the three moves and an add is 15 bytes. That's
almost 4 times as big.

And size _does_ matter. The extra 11 bytes means that if you have six of
these sequences in your program, you are pretty much _guaranteed_ one more
icache miss from memory. That's a few hundred cycles these days.
Considering that you _maybe_ won a cycle or two each time it was executed,
it's not at all clear that it's a win, except in benchmarks that have huge
repeat-rates. Real life doesn't usually have that. In many real-life
schenarios, repeat rates are in the tens of hundreds for most code...

And that's ignoring things like disk load times etc.

Sadly, the situation is often one where when you actually do all the
performance testing, you artificially increase the repeat-rates hugely:
you run the same program a thousand times in order to get a good profile,
and you keep in the the cache all the time. So performance analysis often
doesn't actually _see_ the downsides.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/