Re: [going OFF TOPIC] Re: do_fast_gettimeoffset oops explained (6x86MX Bug)

=?ISO-8859-1?Q?Andr=E9?= Derrick Balsa (andrebalsa@altern.org)
Fri, 15 May 1998 09:48:20 -0100


Hi Rafael,

Rafael R. Reilova wrote:
>
...
> Thanks you all for your replies. I tried your suggestions (see code at
> end), and it does look like the TSC is slowing/stopping, but is hard to
> tell if its stopping completely (see included output).

The TSC doesn't stop completely. It only stops when Linux is idle
(because the 6x86MX goes in suspend mode and powers down the TSC
circuitry). It restarts when the CPU comes out of the Halt state (i.e.
because of an interrupt such as e.g. the timer interrupt (once every 10
ms)).

> I have yet to
> receive a kernel Ops and my uptime is two days with the suspend-on-halt
> enabled (probably I just been lucky so far).

That's one of the reasons this bug was so difficult to trace. The Oopses
appear at random intervals.
>
> Is the TSC supposed to count at the full clock rate all the time? Will
> not doing so screw up the kernel? If the TSC is stopping then Cyrix docs
> are completely wrong. They actually state the exact opposite.

No, yes and correct. :)

No:
===
The TSC is supposed to count CPU clock cycles. That's what it says in
the Intel Pentium, the AMD K5/K6, the Centaur C6 and the Cyrix
documentation. However, the Pentium doesn't have a suspend mode like the
Cyrix CPU. But you can stop or slow down the Pentium clock. If you slow
down or stop the Pentium clock, the TSC slows down or stops counting CPU
clock cycles. Likewise on all the other x86 CPUs with TSC capability
(AMD K5/K6, 6x86MX, Pentium, PII, PPro, Centaur C6).

Yes:
====
The Linux kernel has, in /arch/i386/kernel/time.c, a function called
do_fast_gettimeoffset. This function is enabled if the kernel detects
that the CPU has a TSC in the CPU identification routine
(/arch/i386/kernel/setup.c). The do_fast_gettimeoffset algorithm makes
the assumption that the TSC is never stopped nor slowed down i.e. that
it is just a very fast, very accurate real time clock. However, this
assumption does not hold if either of the following conditions happen:
a) APM is enabled (because the CPU clock can be slowed down or stopped,
independently of the vendor of the CPU), or
b) you have a 6x86MX _and_ have explicitly enabled the Suspend-on-Halt
feature, or
c) you have a Centaur C6, or,
d) the TSC gets written by something else.
"Screwing up the kernel" is vague, but a stopped TSC can lead to a
division by zero fault, which crashes the machine. A slowed down TSC
will lead to incorrect time data returned by gettimeofday().

Correct:
========
The Cyrix documentation states that the TSC will continue counting when
the CPU enters suspend mode. This is WRONG. The TSC stops counting when
the CPU enters a HALT state and Suspend-on-Halt has been enabled (note
that the default is to have Suspend-on-Halt disabled after a CPU reset).
If you want to write to Cyrix and tell them to include this in their
errata, please do so. I already did, and as usual, didn't get an answer.
Note that this is _not_ a bug, the CPU is just doing what it's designed
to do. OTOH they should correctly document this.

>
> Just a though, if the TSC stopping/slowing is such a bad thing what
> happens when a APM bios/mobo slows down the system clock. Crashes would
> also happen to Intel too, if the BIOS goes as far as stopping the system
> clock. Yet, this doesn't happen, what am I missing?

Check the source code. You will see #ifndefs CONFIG_APM around the
do_fast_gettimeoffset() function. The time.c code avoids using the TSC
when APM is enabled in the kernel config. This workaround was added
because people noticed kernel oopses happened with APM enabled.

Unfortunately, nobody has come up with an APM-compatible
do_fast_gettimeoffset(). Yet. I am working on it.
>
> Curious/brave Cyrix users can try the following program. Of course, I
> take no responsability of what may become of you computer afterwards.
>
> BTW, suspending the CPU is a Good Thing, iff it doesn't break anything.

I agree 100%. Hence my previous patch 5 months ago that never made it to
the 2.1.x kernels (but made it to 2.0.34pre), and the present patch.
>
> -------------------------------------------
> /* program to test the TSC bug on Cyrix 6x86 and 6x86MX */
> #include <stdio.h>
> #include <unistd.h>
> #if (__GNUC__ == 2)
> #include <sys/perm.h>
> #endif
> #include <asm/io.h>
>
> /* asm macro to read the tsc */
> #define rdtsc(LSB, MSB) asm("rdtsc" : "=a" (LSB), "=d" (MSB) )
>
> /* define to test suspend-on-halt, needs root perms */
> #define SUSP_OHALT 1
>
> void loop_test(int times)
> {
> unsigned long msb, lsb, prev;
>
> rdtsc(prev, msb);
> for (; times; times--) {
> sleep(1);
> rdtsc(lsb, msb);
> printf("%lu cycles in 1 sec.\n", lsb - prev);
> prev = lsb;
> }
> }
>
> int main()
> {
> #ifdef SUSP_OHALT
> char data;
>
> if (ioperm(0x22, 2, 1) != 0)
> exit(1);
>
> /* enable suspend on halt */
> outb(0xc2, 0x22);
> data = inb(0x23);
> outb(0xc2, 0x22);
> outb(data | 8, 0x23);
> loop_test(5);
>
> /* disable suspend on halt */
> outb(0xc2, 0x22);
> data = inb(0x23);
> outb(0xc2, 0x22);
> outb(data & ~8, 0x23);
> #endif
>
> loop_test(5);
> exit(0);
> }
>
> ------------------------------
> Sample output
>
> 390413 cycles in 1 sec. <- with suspend on halt enabled
> 459959 cycles in 1 sec.
> 430947 cycles in 1 sec.
> 455201 cycles in 1 sec.
> 579859 cycles in 1 sec.
> 151261875 cycles in 1 sec. <- normal operation (~150MHz)
> 151502457 cycles in 1 sec.
> 151502139 cycles in 1 sec.
> 151502479 cycles in 1 sec.
> 151502037 cycles in 1 sec.
>
...
Nice program, really demonstrates what I have been saying since my first
email which started the original thread.

Thanks,
------------------------
André Balsa
andrebalsa@altern.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu