Re: [PATCH v6 29/29] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

From: Ricardo Neri
Date: Mon May 16 2022 - 23:06:34 EST


On Tue, May 10, 2022 at 10:14:00PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET-based hardlockup detector relies on the TSC to determine if an
> > observed NMI interrupt was originated by HPET timer. Hence, this detector
> > can no longer be used with an unstable TSC.
> >
> > In such case, permanently stop the HPET-based hardlockup detector and
> > start the perf-based detector.
> >
> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> > Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@xxxxxxxxx>
> > Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
> > Cc: x86@xxxxxxxxxx
> > Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes since v5:
> > * Relocated the delcaration of hardlockup_detector_switch_to_perf() to
> > x86/nmi.h It does not depend on HPET.
> > * Removed function stub. The shim hardlockup detector is always for x86.
> >
> > Changes since v4:
> > * Added a stub version of hardlockup_detector_switch_to_perf() for
> > !CONFIG_HPET_TIMER. (lkp)
> > * Reconfigure the whole lockup detector instead of unconditionally
> > starting the perf-based hardlockup detector.
> >
> > Changes since v3:
> > * None
> >
> > Changes since v2:
> > * Introduced this patch.
> >
> > Changes since v1:
> > * N/A
> > ---
> > arch/x86/include/asm/nmi.h | 6 ++++++
> > arch/x86/kernel/tsc.c | 2 ++
> > arch/x86/kernel/watchdog_hld.c | 6 ++++++
> > 3 files changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> > index 4a0d5b562c91..47752ff67d8b 100644
> > --- a/arch/x86/include/asm/nmi.h
> > +++ b/arch/x86/include/asm/nmi.h
> > @@ -63,4 +63,10 @@ void stop_nmi(void);
> > void restart_nmi(void);
> > void local_touch_nmi(void);
> >
> > +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR
> > +void hardlockup_detector_switch_to_perf(void);
> > +#else
> > +static inline void hardlockup_detector_switch_to_perf(void) { }
> > +#endif
> > +
> > #endif /* _ASM_X86_NMI_H */
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cc1843044d88..74772ffc79d1 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1176,6 +1176,8 @@ void mark_tsc_unstable(char *reason)
> >
> > clocksource_mark_unstable(&clocksource_tsc_early);
> > clocksource_mark_unstable(&clocksource_tsc);
> > +
> > + hardlockup_detector_switch_to_perf();
> > }
> >
> > EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> > diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
> > index ef11f0af4ef5..7940977c6312 100644
> > --- a/arch/x86/kernel/watchdog_hld.c
> > +++ b/arch/x86/kernel/watchdog_hld.c
> > @@ -83,3 +83,9 @@ void watchdog_nmi_start(void)
> > if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
> > hardlockup_detector_hpet_start();
> > }
> > +
> > +void hardlockup_detector_switch_to_perf(void)
> > +{
> > + detector_type = X86_HARDLOCKUP_DETECTOR_PERF;
>
> Another possible problem along the same lines here,
> isn't your watchdog still running at this point? And
> it uses detector_type in the switch.
>
> > + lockup_detector_reconfigure();
>
> Actually the detector_type switch is used in some
> functions called by lockup_detector_reconfigure()
> e.g., watchdog_nmi_stop, so this seems buggy even
> without concurrent watchdog.

Yes, this true. I missed this race.

>
> Is this switching a good idea in general? The admin
> has asked for non-standard option because they want
> more PMU counterss available and now it eats a
> counter potentially causing a problem rather than
> detecting one.

Agreed. A very valid point.
>
> I would rather just disable with a warning if it were
> up to me. If you *really* wanted to be fancy then
> allow admin to re-enable via proc maybe.

I think that in either case, /proc/sys/kernel/nmi_watchdog
need to be updated to reflect that the NMI watchdog has
been disabled. That would require to expose other interfaces
of the watchdog.

Thanks and BR,
Ricardo