False "lost ticks" on dual-Opteron system (=> timer twice as fast)

From: Bernd Paysan
Date: Sun May 08 2005 - 07:47:00 EST


Hi,

I've recently set up a dual Opteron RAID server (AMD-8000-based Tyan
Thunder K8S Pro SCSI board, 2 246 Opterons, stepping 10). Kernel is a
modified 2.6.11.4-20a from SuSE 9.3 (SMP version, sure). The Opterons
are capable of changing the CPU frequency (between 1GHz and 2GHz).

The system clock runs (on average) about twice as fast as it should be.
A closer observation revealed that the clock jumps forward by about
10-30 seconds every 10-30 seconds (plus other oddities, including
backward clock jumps). The timer interrupts are distributed roughly
evenly among the two CPUs, but looking at the timer interrupt number
(grep timer /proc/interrupts) revealed that for about 10-30 seconds,
one CPU gets the interrupt, and then the other CPU gets them; the
transition causes the system clock to advance.

A quick look at timer_interrupt shows what I suspect is the culprit:
Each CPU keeps track of the last TSC at a timer interrupt, and adds the
"lost" ticks to jiffies when perceived necessary. If there's only a
single jiffies, but two vxtime.last_tsc, it can't work.

A quick workaround would be to ditch the handling of the "lost" jiffies.
I still anticipate to have annoying time skews by do_gettimeoffset()
(that's what explains the other oddities - if I do gettimeofday() on
the CPU that isn't getting interrupts, I'll going to add the "lost"
jiffies, too). A proposed fix would be to *also* store the last jiffies
value in the vxtime variable, and verify if it's really *this* CPU that
did miss the timer interrupts. This local "last-stored-jiffies" can
help do_gettimeoffset() to calculate the local time good enough on both
CPUs.

What I can't believe is that I'm the only one who has this problem.

<rant>I know the timer system on an Intel or AMD system is broken by
design, because there should be a single constant-clocked atomically
read-only system-wide timer. But this is no excuse for that ;-).</rant>

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Attachment: pgp00000.pgp
Description: PGP signature