Re: [patch] __volatile__ needed in get_cycles()?

Tigran Aivazian (tigran@sco.COM)
Tue, 30 Mar 1999 08:05:39 +0100 (BST)


Hi Andrea,

On Mon, 29 Mar 1999, Andrea Arcangeli wrote:

> +extern cycles_t inline get_cycles_ordered(void)
> +{
> + /* I know I should not use `register' but I can't resist ;). -Andrea */
> + register cycles_t foo;
> +
> + __asm__ __volatile__ ("lock; addl $0,0(%%esp)": :);
> + foo = get_cycles();
> + __asm__ __volatile__("": :);
> +
> + return foo;
> +}
> +
Yes, I like the above, I did not know you can do

__asm__ __volatile__("": :);

to stop compiler from re-ordering things (because I never looked at wmb()
macro). (and I like the "pseudo-smiley" at the end :).

Also, certainly bus-locked movl (or addl) is cheaper than cpuid (cpuid is
messy and trashes your registers).

>
> This my new inline function should assure both asm ordering and CPU
> ordering, but as just said, get_cycles() is just fine and this one is a
> different issue that so far is been handled by hand (also because usually
> you need this trick only for profiling and not for real code).
> ...
> Note also the the original mb() was also flushing the register set because
> such move to memory was needed for SMP safe operation, while instead I
> only assured the ordering of the instruction at the CPU level and the
> ordering of the asm. I hope I have not missed something ;). A safer
> version would be
>
> mb()
> ret = get_cycles()
> barrier()
>
> (and it would have worked also in include/linux/timex.h) but it looked to
> me an overhead that could also lead to not relialable profiling results.

Yes, I'd call the profiling results with minimum overhead (including cache
side effects) more reliable, although in the Intel's paper I mentioned
yesterday
(URL http://developer.intel.com/drg/pentiumII/appnotes/RDTSCPM1.pdf for
those who want it) they consider the other kind as well.

>
> Comments?
The only comment is - I hope Linus (when he is back from holiday) will
consider including your get_cycles_ordered() in <asm-i386/timex.h> as it
seems worth having (and I see no problems, but I may be wrong).
I assume that if something is inline but is never referenced gcc will
simply ignore it so, whoever wants to profile something he will just put
it there manually. Almost (ignoring compile-time) zero overhead.

Regards,
Tigran.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/