Re: SMP time warps: count > LATCH - 1

From: Christopher Thompson (chris@hypocrite.org)
Date: Wed May 31 2000 - 10:17:45 EST


I have this problem. On my machine, it is because my two processors run at
different speeds. I created a patch to address this, it is at
http://hypocrite.org/linux/tsc.patch.new.tar.gz and applies cleanly to at least
2.3.99-pre6 and -pre8, probably most other recent kernels. Apply the patch and
do a make menuconfig (or whatever). Disable use of TSC, it is an option in the
processor selection iirc.

Please try this and let me know whether it fixes your problem. I DO NOT KNOW
IF IT WILL. It disables the use of the TSC on modern CPUs and if this is where
your problem stems from, it should work. Either way, it would be very useful
to know the result.

This is not going to end up in the official Linux kernel. Alan Cox says work
is in progress on a different (better) patch.

I am interested, also, to know what motherboard you are using and the speed of
each of your CPUs.

On Tue, 30 May 2000, you wrote:
> Hi. I have a long-standing problem with timekeeping on my 2xPIII
> machine: sooner or later after a reboot, successive calls to
> gettimeofday will return large negative deltas (i.e. time goes
> backwards). Several kernel gurus have offered help and patches, but
> to no avail; I usually end up just brute-forcing the kernel to never
> return a time earlier than one previously returned.
>
> I recently tried 2.3.99-pre7 to see if anything was different (time
> is still screwed up); but I added some printk's to see what was
> happening. I've found that in the delay_at_last_interrupt calculation
> near .../arch/i386/kernel/time.c:523:
>
> count = ((LATCH-1) - count) * TICK_SIZE;
>
> the value retrieved from the i8253 timer becomes greater than
> (LATCH-1) for some reason, and forever after the clock is unstable.
> The correlation seems to be good: the system is OK until the first
> time (count > (LATCH-1)), and thereafter both (count > (LATCH-1)) and
> negative time steps happen 10's to 100's of times per second.
>
> So I wonder:
>
> * Is the count in the 8253 *always* supposed to be less than LATCH-1
> (==11931)?
>
> * Is there anything besides flakey hardware that might cause this
> behaviour?
>
> regards,
> d.
> -
> Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/dmentre/smp-howto/
> To Unsubscribe: send "unsubscribe linux-smp" to majordomo@vger.rutgers.edu

-- 
Christopher Thompson  http://hypocrite.org/
         "Flawed, weak, organic."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:27 EST