Re: [PATCH clocksource 5/6] clocksource: Suspend the watchdog temporarily when high read latency detected

From: Feng Tang
Date: Wed Jan 11 2023 - 07:34:58 EST


On Wed, Jan 11, 2023 at 12:26:58PM +0100, Thomas Gleixner wrote:
> On Wed, Jan 04 2023 at 17:07, Paul E. McKenney wrote:
> > This can be reproduced by running memory intensive 'stream' tests,
> > or some of the stress-ng subcases such as 'ioport'.
> >
> > The reason for these issues is the when system is under heavy load, the
> > read latency of the clocksources can be very high. Even lightweight TSC
> > reads can show high latencies, and latencies are much worse for external
> > clocksources such as HPET or the APIC PM timer. These latencies can
> > result in false-positive clocksource-unstable determinations.
> >
> > Given that the clocksource watchdog is a continual diagnostic check with
> > frequency of twice a second, there is no need to rush it when the system
> > is under heavy load. Therefore, when high clocksource read latencies
> > are detected, suspend the watchdog timer for 5 minutes.
>
> We should have enough heuristics in place by now to qualify the output of
> the clocksource watchdog as a random number generator, right?

The issue was found on a 8 sockets machine (around 400 cores, 800 CPUs),
and seems with the bigger and bigger CPU numbers, the spark latency
of reading HPET or even TSC is very high, which does affect the
accuracy of clocksource watchdog check. And unfortunately, we can't
just disable the watchdog for these 8 sockets machine.

We tried a debug patch which disables interrupt and does consective
reads with 'rdtsc', and check the delta between these reads (when
system is running some heavy load), sometimes we can see up to
300 micro-seconds delta, on a 2-sockets Icelake machine. Similar
thing if we replace the 'rdtsc' with 'rdtscp' or 'lfence;rdtsc'.
And I was told the max latency is much higher on the 8 sockets
machine.

Thanks,
Feng