Re: [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected

From: Waiman Long
Date: Wed Dec 21 2022 - 22:42:52 EST


On 12/21/22 19:40, Paul E. McKenney wrote:
commit 199dfa2ba23dd0d650b1482a091e2e15457698b7
Author: Paul E. McKenney<paulmck@xxxxxxxxxx>
Date: Wed Dec 21 16:20:25 2022 -0800

clocksource: Verify HPET and PMTMR when TSC unverified
On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
TSC is disabled. This works well much of the time, but there is the
occasional system that meets all of these criteria, but which still
has a TSC that skews significantly from atomic-clock time. This is
usually attributed to a firmware or hardware fault. Yes, the various
NTP daemons do express their opinions of userspace-to-atomic-clock time
skew, but they put them in various places, depending on the daemon and
distro in question. It would therefore be good for the kernel to have
some clue that there is a problem.
The old behavior of marking the TSC unstable is a non-starter because a
great many workloads simply cannot tolerate the overheads and latencies
of the various non-TSC clocksources. In addition, NTP-corrected systems
often seem to be able to tolerate significant kernel-space time skew as
long as the userspace time sources are within epsilon of atomic-clock
time.
Therefore, when watchdog verification of TSC is disabled, enable it for
HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel
time-skew diagnostic without degrading the system's performance.
Signed-off-by: Paul E. McKenney<paulmck@xxxxxxxxxx>
Cc: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar<mingo@xxxxxxxxxx>
Cc: Borislav Petkov<bp@xxxxxxxxx>
Cc: Dave Hansen<dave.hansen@xxxxxxxxxxxxxxx>
Cc: "H. Peter Anvin"<hpa@xxxxxxxxx>
Cc: Daniel Lezcano<daniel.lezcano@xxxxxxxxxx>
Cc: Feng Tang<feng.tang@xxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx
Cc:<x86@xxxxxxxxxx>

As I currently understand, you are trying to use TSC as a watchdog to check against HPET and PMTMR. I do have 2 questions about this patch.

First of all, why you need to use both HPET and PMTMR? Can you just use one of those that are available. Secondly, is it possible to enable this time-skew diagnostic for a limit amount of time instead running indefinitely? The running of the clocksource watchdog itself will still consume a tiny amount of CPU cycles.

Cheers,
Longman