Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

From: Ricardo Neri
Date: Tue May 17 2022 - 18:04:38 EST


On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET hardlockup detector relies on tsc_khz to estimate the value of
> > that the TSC will have when its HPET channel fires. A refined tsc_khz
> > helps to estimate better the expected TSC value.
> >
> > Using the early value of tsc_khz may lead to a large error in the expected
> > TSC value. Restarting the NMI watchdog detector has the effect of kicking
> > its HPET channel and make use of the refined tsc_khz.
> >
> > When the HPET hardlockup is not in use, restarting the NMI watchdog is
> > a noop.
> >
> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> > Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@xxxxxxxxx>
> > Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
> > Cc: x86@xxxxxxxxxx
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes since v5:
> > * Introduced this patch
> >
> > Changes since v4
> > * N/A
> >
> > Changes since v3
> > * N/A
> >
> > Changes since v2:
> > * N/A
> >
> > Changes since v1:
> > * N/A
> > ---
> > arch/x86/kernel/tsc.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..cc1843044d88 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> > /* Inform the TSC deadline clockevent devices about the recalibration */
> > lapic_update_tsc_freq();
> >
> > + /*
> > + * If in use, the HPET hardlockup detector relies on tsc_khz.
> > + * Reconfigure it to make use of the refined tsc_khz.
> > + */
> > + lockup_detector_reconfigure();
>
> I don't know if the API is conceptually good.
>
> You change something that the lockup detector is currently using,
> *while* the detector is running asynchronously, and then reconfigure
> it.

Yes, this is what happens.

> What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter

Please see my comment

> but in principle it's a bad API to export.
>
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped

I see.

> [actually that looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].

Perhaps there should be a watchdog_thresh_user. When the user updates it,
the detector is stopped, watchdog_thresh is updated, and then the detector
is resumed.

>
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.

I had it like this but it did not look right to me. You are right, however,
I can stop/restart the watchdog when needed.

Thanks and BR,
Ricardo