Re: [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

From: Ricardo Neri
Date: Tue Jun 18 2019 - 18:52:22 EST


On Tue, Jun 11, 2019 at 10:11:04PM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> > @@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
> > return;
> >
> > if (hdata->has_periodic)
> > - period = watchdog_thresh * hdata->ticks_per_second;
> > + period = watchdog_thresh * hdata->ticks_per_cpu;
> >
> > count = hpet_readl(HPET_COUNTER);
> > - new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> > + new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
> > hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);
>
> So with this you might get close to the point where you trip over the SMI
> induced madness where CPUs vanish for several milliseconds in some value
> add code. You really want to do a read back of the hpet to detect that. See
> the comment in the hpet code. RHEL 7/8 allow up to 768 logical CPUs....

Do you mean adding a readback to check if the new compare value is
greater than the current count? Similar to the check at the end of
hpet_next_event():

return res < HPET_MIN_CYCLES ? -ETIME : 0;

In such a case, should it try to set the comparator again? I think it
should, as otherwise the hardlockup detector would stop working.

Thanks and BR,
Ricardo
>
> Thanks,
>
> tglx