Re: Losing too many ticks! TSC cannot be used as time source...

From: john stultz
Date: Wed Nov 19 2003 - 17:43:05 EST


On Wed, 2003-11-19 at 03:21, Petr Vandrovec wrote:
> Hi,
> today morning my kernel (2.6.0-test9-something) said that
> TSC became unusable. There are no other messages around this
> time in the log. It could be thermal throttling, but it should
> print some message when X86_MCE_P4THERMAL is enabled, yes?
> It happened after 17hrs 56min of uptime. System never produced
> this message before.

Hmmm. Interesting. You haven't seen it before and you've been running
2.6.0-testX for awhile?

Any heavy cron jobs running at that time?

The message is caused after 100 consecutive timer ticks where it appears
that the system has lost ticks. The assumption is that something has
gone wrong and we can no longer trust the TSC as a time source
(speedstep, for instance). Thermal throttling is a possibility, but I've
not actually seen it occur. I'd have to defer to the cpufreq folks on
that one.

If we're getting false positives, we may have to bump that
100-consecutive-ticks number up.

Anything else quirky about the system?


> Are there some more information I could supply, or should I
> simple live with fact that TSC stopped working on my P4 today
> morning?

Well, the TSC didn't stop working, we just stopped using it as a time
source. Things should work fine using just the PIT, although I'd be
interested to hear if ntpd finally settled down after the change. If it
cannot stay synced it may be an issue w/ your system's PIT.

You may also want to check the ACPI PM timesource code in -mm4, and see
how that works for you.

thanks for the report!
-john


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/