Re: [PATCH clocksource 5/6] clocksource: Suspend the watchdog temporarily when high read latency detected

From: Feng Tang
Date: Wed Jan 11 2023 - 20:02:30 EST


On Wed, Jan 11, 2023 at 01:32:10PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 11, 2023 at 10:19:50PM +0100, Thomas Gleixner wrote:
> > On Wed, Jan 11 2023 at 09:50, Paul E. McKenney wrote:
> > > On Wed, Jan 11, 2023 at 12:26:58PM +0100, Thomas Gleixner wrote:
> > > Yes, if a system was 100% busy forever, this patch would suppress these
> > > checks. But 100% busy forever is not the common case, due to thermal
> > > throttling and to security updates if nothing else.
> > >
> > > With all that said, is there a better way to get the desired effects of
> > > this patch?
> >
> > Sane hardware?
>
> I must let Feng talk to his systems, but most of the systems I saw were
> production systems. A few were engineering samples, from which some
> insanity might be expected behavior.

I've tested with several generations of Xeon servers, and they all
can reproduce the issue with stress-ng stress load. Those platforms
are not bought from market :), but they have latest stepping and
firmware, which are close to production systesm.

The issue originally came from customer, and there were engineers
who reproduced it on production systems(even from different vendors)

Thanks,
Feng

> Clearly, something about the hardware or firmware was insane in order
> to get this result, but that is what diagnostics are for, even on
> engineering samples.
>
> Thanx, Paul