Re: Everything you want to know about time (Was: Cyrix 6x86MX and Centaur C6 CPUs in 2.1.102)

=?ISO-8859-1?Q?Andr=E9?= Derrick Balsa (andrebalsa@altern.org)
Thu, 21 May 1998 09:39:24 -0100


Hi Scott,

Kudos to you for explaining what do_fast_gettimeoffset() does (and how
it does what it does)! :)

Since your post was very long, excuse me for snipping some parts of it.
I have just a few comments/additions:

C. Scott Ananian wrote:
...
> *However*, there are many cases where it is not reasonable to assume that
> "average internal clocks per usec" is a constant. Haltable CPUs are one
> example, as are APM-enabled machines that slow the processor clock to save
> power. And let's not forget those ol' machines with the 'turbo' button on
> the front. Perhaps an average over the last N jiffies is more
> appropriate.

Actually, there is a simple way to do a RDTSC calibration that will get
us an accurate measurement of the number of cpu_clock_cycles/jiffy:

At kernel boot, in time_init (which should run with interrupts
disabled),

a) Do a busy wait on the CTC timer until it zeroes.
b) Now read the TSC.
c) Do another busy wait on the CTC timer until it zeroes.
d) Read the TSC again.
e) The difference in TSC readings is our cpu_clock_cycles/jiffy.

Maximum time spent on this calibration procedure is just 0.02 seconds,
and this is at boot time. Note that since both the TSC and the CPU clock
depend on the same quartz crystal (a 14.31818MHz part), they will never
drift apart, so there is *no* need to repeat this calibration procedure.

...
> THE EVIL OOPS POTENTIAL: (this is what really matters)
> ~~~~~~~~~~~~~~~~~~~~~~~~
> BUT some machines do the unthinkable -- they actually randomly destroy the
> TSC value during "power-saving." This is the Cyrix bug.

Sorry if I am being picky, but this is not quite so. The Cyrix doesn't
destroy the TSC or anything of the sort.

We have two cases:

1) If the Suspend-on-Halt feature is disabled (default state after a
reset), the Cyrix 6x86MX will behave just like the Intel part.

2) If Suspend-on-Halt is _explicitly_ enabled (e.g. using a utility
called set6x86), the Cyrix 6x86MX acts like the Centaur C6 step 0, i.e.
it stops the TSC when Halted.

Apart from this small detail, your description of the oops in
do_fast_gettimeoffset() is 100% correct.

> This also occurs
> during APM suspend: only the low 32 bits of the TSC can be restored after
> you power off the processor; the high 32 bits are zeroed. [The Centaur
> shouldn't Oops, as it doesn't destroy the TSC, it just stops it. This
> leads to a more subtle (but non-catastrophic) problem, discussed later.]
> It took me some thinking to figure out exactly how destroying the TSC
> causes the oops, so I'll recap the process for you. (It seems obvious in
> retrospect; the hard part was decoding the asm. Brainy people can read
> linux/arch/i386/kernel/time.c, understand it immediately, and then skip
> ahead.)
>
Even in retrospect I can't say it's obvious. Thanks for explaining it so
well.

> So, there are two issues here. First off, we should definitely not use
> this routine if the TSC is likely to be trashed for any reason: we'll get
> bad time values, and the occasional Oops. (Note that only intra-jiffy
> times are affected; read the source for the details on how the error is
> bounded). ['quotient' for a trashed TSC will either be very large or very
> small, leading to intra-jiffy times that either race ahead or lag behind]

The error is bounded as follows:

maximum 1 jiffy error - 1 microsecond over 1 jiffy. Or if you prefer,
100% over 1 jiffy, 50% over 2 jiffies, etc...
>
> Second, is a cumulative average really the way to go here?

Definitely *not*. BTW if we are concerned with the time it takes for
gettimeofday() to execute, _this_ cumulative average algorithm in
do_fast_gettimeoffset() is a low performance solution.

>
> Lastly, exporting the real CPU cycle time to /proc/cpuinfo. Not a
> difficult thing: we've got init_timer_cc, last_timer_cc, and jiffies in
> memory -- not to mention cached_quotient sitting around. The hairy thing
> is that this information is probably not available on all architectures,
> and exporting the relevant data in a architecture-independent manner will
> probably be hairy. Not impossible, but it will have to be done carefully
> and cleanly if it's going to make it past Linus.

if (machine_supports_tsc)
show_MegaHertz_rating
else
show_Bogomips_rating

should go a long way to solve the confusion in many people's minds about
bogomips, MHz and CPU performance...

> Remember that all the
> reasons for not using do_fast_gettimeoffset are also going to be reasons
> why measuring your CPU clock rate this way isn't going to work. Probably
> best to stick with BogoMIPS.

Bogomips should be kept internally because there is kernel code that
still depends on it for accurate timing, I think. Reporting it in
/proc/cpuinfo is another issue, IMHO.

Cheers,
------------------------
André Balsa
andrebalsa@altern.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu