Re: Semaphore assembly-code bug

From: Linus Torvalds
Date: Mon Nov 01 2004 - 22:15:47 EST




On Mon, 1 Nov 2004, linux-os wrote:
>
> Wrong.
>
> (1) The '486 didn't have the rdtsc instruction.
> (2) There are no 'serializing' or other black-magic aspects of
> using the internal cycle-counter. That's exactly how you you
> can benchmark the execution time of accessible code sequences.

Sorry, but you shouldn't argue with people who know more than you do. I
know Dean, and he analyzes things for work, and does know what he is
doing.

"rdtsc" _does_ partly serialize things, and it's not even architecturally
defined, so you'll find that it serializes things in different ways on
different CPU's. You can't just do

rdtsc
...
rdtsc

and expect the stuff in between the rdtsc's to be timed exactly: some of
it will overlap with the rdtsc's, some of it won't.

On Intel, if I recall correctly, rdtsc is totally serializing, so you're
testing not just the instructions between the rdtsc's, but the length of
the pipeline, and the time it takes for stuff around it to calm down.
Which is why two rdtsc's in sequence will show quite a lot of overhead on
a P4 (something like 80 cycles).

So you really want to do more operations in between the rdtsc's.

Try the appended program. On a P4, the two sequnces are the same for me
(92 cycles, 80 cycles overhead), while on a Pentium M, the sequence of two
popl's (57 cycles) is faster than the sequence of "lea+popl" (59 cycles)
and the overhead is 47 cycles.

So can you _please_ just admit that you were wrong? On a P4, the pop/pop
is the same cost as lea/pop, and on a Pentium M the pop/pop is faster,
according to this test. Your contention that "pop" has to be slower than
"lea" is WRONG.

Linus

----
#define PUSHEBX "pushl %%ebx\n\t"
#define PUSHECX "pushl %%ecx\n\t"
#define POPECX "popl %%ecx\n\t"
#define POPEBX "popl %%ebx\n\t"

#ifdef TEST_LEA

#undef POPECX
#define POPECX "leal 4(%%esp),%%esp\n\t"

#endif

#ifdef TEST_OVERHEAD

#undef PUSHEBX
#undef PUSHECX
#undef POPEBX
#undef POPECX

#define PUSHEBX
#define PUSHECX
#define POPEBX
#define POPECX

#endif

int main(void)
{
unsigned long start;
unsigned long long end;

asm volatile(
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
PUSHEBX
PUSHECX
"rdtsc\n\t"
POPECX
POPEBX
POPECX
POPEBX
POPECX
POPEBX
POPECX
POPEBX
POPECX
POPEBX
POPECX
POPEBX
POPECX
POPEBX
POPECX
POPEBX
"movl %%eax,%%esi\n\t"
"rdtsc"
:"=A" (end), "=S" (start));
printf("%ld cycles\n", (long) end-start);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/