Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

From: Paul E. McKenney
Date: Mon Apr 12 2021 - 14:20:52 EST


On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
> On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote:
> > On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
> >> So I need to is inline clocksource_verify_percpu_wq()
> >> into clocksource_verify_percpu() and then move the call to
> >> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
> >> before the existing call to list_del_init(). Will do!
> >
> > Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
> > due to interrupts being disabled across that list_del_init().
> >
> > Possibilities include:
> >
> > 1. Figure out why interrupts must be disabled only sometimes while
> > holding watchdog_lock, in the hope that they need not be across
> > the entire critical section for __clocksource_watchdog_kthread().
> > As in:
> >
> > local_irq_restore(flags);
> > clocksource_verify_percpu(cs);
> > local_irq_save(flags);
> >
> > Trying this first with lockdep enabled. Might be spectacular.
>
> Yes, it's a possible deadlock against the watchdog timer firing ...

And lockdep most emphatically agrees with you. ;-)

> The reason for irqsave is again historical AFAICT and nobody bothered to
> clean it up. spin_lock_bh() should be sufficient to serialize against
> the watchdog timer, though I haven't looked at all possible scenarios.

Though if BH is disabled, there is not so much advantage to
invoking it from __clocksource_watchdog_kthread(). Might as
well just invoke it directly from clocksource_watchdog().

> > 2. Invoke clocksource_verify_percpu() from its original
> > location in clocksource_watchdog(), just before the call to
> > __clocksource_unstable(). This relies on the fact that
> > clocksource_watchdog() acquires watchdog_lock without
> > disabling interrupts.
>
> That should be fine, but this might cause the softirq to 'run' for a
> very long time which is not pretty either.
>
> Aside of that, do we really need to check _all_ online CPUs? What you
> are trying to figure out is whether the wreckage is CPU local or global,
> right?
>
> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
> enough? Either the other CPU has the same wreckage, then it's global or
> it hasn't which points to a per CPU local issue.
>
> Sure it does not catch the case where a subset (>1) of all CPUs is
> affected, but I'm not seing how that really buys us anything.

Good point! My thought is to randomly pick eight CPUs to keep the
duration reasonable while having a good chance of hitting "interesting"
CPU choices in multicore and multisocket systems.

However, if a hard-to-reproduce problem occurred, it would be good to take
the hit and scan all the CPUs. Additionally, there are some workloads
for which the switch from TSC to HPET is fatal anyway due to increased
overhead. For these workloads, the full CPU scan is no additional pain.

So I am thinking in terms of a default that probes eight randomly selected
CPUs without worrying about duplicates (as in there would be some chance
that fewer CPUs would actually be probed), but with a boot-time flag
that does all CPUs. I would add the (default) random selection as a
separate patch.

I will send a new series out later today, Pacific Time.

Thanx, Paul