Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline

From: Peter Zijlstra
Date: Wed Jun 27 2018 - 13:52:08 EST


On Wed, Jun 27, 2018 at 08:57:21AM -0700, Paul E. McKenney wrote:
> > Another variant, which simply skips the wakeup whever ran on an offline
> > CPU, relying on the wakeup from rcutree_migrate_callbacks() right after
> > the CPU really is dead.
>
> Cute! ;-)
>
> And a much smaller change.
>
> However, this means that if someone indirectly and erroneously causes
> rcu_report_qs_rsp() to be invoked from an offline CPU, the result is an
> intermittent and difficult-to-debug grace-period hang. A lockdep splat
> whose stack trace directly implicates the culprit is much better.

How so? We do an unconditional wakeup right after finding the offline
cpu dead. There is only very limited code between offline being true and
the CPU reporting in dead.