Re: [PATCH 3/4] watchdog/hardlockup: improve buddy system detection timeliness

From: Doug Anderson

Date: Thu Mar 05 2026 - 11:46:02 EST


Hi,

On Thu, Mar 5, 2026 at 5:47 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -163,8 +171,13 @@ static bool is_hardlockup(unsigned int cpu)
> > {
> > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > - if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > - return true;
> > + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) {
> > + per_cpu(hrtimer_interrupts_missed, cpu)++;
> > + if (per_cpu(hrtimer_interrupts_missed, cpu) >= watchdog_hardlockup_miss_thresh)
>
> This would return true for every check when missed >= 3.
> As a result, the hardlockup would be reported every 4s.
>
> I would keep the 12s cadence and change this to:
>
> if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh == 0)

I could be confused, but I don't think this is needed because we clear
"hrtimer_interrupts_missed" to 0 any time we save the timer count.
While I believe the "%" will functionally work, it seems harder to
understand, at least to me.


> > + return true;
> > +
> > + return false;
> > + }
> >
> > /*
> > * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > --- a/kernel/watchdog_buddy.c
> > +++ b/kernel/watchdog_buddy.c
> > @@ -86,14 +87,6 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
> > {
> > unsigned int next_cpu;
> >
> > - /*
> > - * Test for hardlockups every 3 samples. The sample period is
> > - * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> > - * watchdog_thresh (over by 20%).
> > - */
> > - if (hrtimer_interrupts % 3 != 0)
> > - return;
>
> It would be symetric with the "% 3" above.

Here we weren't resetting the count, so the "%" _was_ important. In
the new code where we're resetting the count back to 0...

-Doug