Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter

From: Peter Zijlstra
Date: Tue May 30 2017 - 13:40:28 EST


On Tue, May 30, 2017 at 10:22:08AM -0700, Andi Kleen wrote:
> > > You would only need a single one per system however, not one per CPU.
> > > RCU already tracks all the CPUs, all we need is a single NMI watchdog
> > > that makes sure RCU itself does not get stuck.
> > >
> > > So we just have to find a single watchdog somewhere that can trigger
> > > NMI.
> >
> > But then you have to IPI broadcast the NMI, which is less than ideal.
>
> Only when the watchdog times out to print the backtraces.

The current NMI watchdog has a per-cpu state. So that means either doing
for_all_cpu() loops or IPI broadcasts from the NMI tickle. Neither is
something you really want.

> > RCU doesn't have that problem because the quiescent state is a global
> > thing. CPU progress, which is what the NMI watchdog tests, is very much
> > per logical CPU though.
>
> RCU already has a CPU stall detector. It should work (and usually
> triggers before the NMI watchdog in my experience unless the
> whole system is dead)

It only goes look at CPU state once it detects the global QS is stalled
I think. But I've not had much luck with the RCU one -- although I think
its been improved since I last had a hard problem.