Re: Semaphore assembly-code bug

From: linux-os
Date: Sun Oct 31 2004 - 20:40:06 EST


On Fri, 29 Oct 2004, Linus Torvalds wrote:



On Fri, 29 Oct 2004, linux-os wrote:

Linus, there is no way in hell that you are going to move
a value from memory into a register (pop ecx) faster than
you are going to do anything to the stack-pointer or
any other register.

Sorry, but you're wrong.

I am not wrong.

I don't understand anything about your theoretical CPU
with the magic stack engine. Anything I can get my
hands on functions exactly as I described and exactly
as would be expected. We work with real hardware here
and I have to test it as part of my job.

And, FYI, I spend all my working time trying to get the
last iota of performance out of ix86 CPUS. Since I can
only read publicly available documentation, I have
to test code in actual operation.

The attached file shows that the Intel Pentium 4 runs
exactly as I described. Further, there is no difference in
the CPU clocks used when adding a constant to the stack-
pointer or using LEA.

It also shows that poping stack-data into the same register
twice, as you suggested, takes the same time as using a
different register.


Timer overhead = 88 CPU clocks
push 3, pop 3 = 12 CPU clocks
push 3, pop 2 = 12 CPU clocks
push 3, pop 1 = 12 CPU clocks
push 3, pop none using ADD = 8 CPU clocks
push 3, pop none using LEA = 8 CPU clocks
push 3, pop into same register = 12 CPU clocks

The code uses a separate assembly-language file so that
the 'C' compiler can't optimize-away what I am measuring.
It also saves and uses the shortest number of CPU cycles
so the code doesn't have to execute with the interrupts
OFF to get a stable reading.


Learn about modern CPU's some day, and realize that cached accesses are
fast, and pipeline stalls are relatively much more expensive.


That's what I do, and that's what I teach.

Now, if it was uncached, you'd have a point.

Also think about why

call xxx
jmp yy

is often much faster than

push $yy
jmp xxx

and other small interesting facts about how CPU's actually work these
days.

Linus


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.

Attachment: test-sp.tar.gz
Description: GNU Zip compressed data