Re: [PATCH tip/core/rcu 1/9] rcu: Provide GP ordering in face of migrations and delays
From: Steven Rostedt
Date: Thu Oct 05 2017 - 10:13:12 EST
On Thu, 5 Oct 2017 15:40:42 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, Oct 05, 2017 at 09:17:03AM -0400, Steven Rostedt wrote:
> > On Wed, 4 Oct 2017 14:29:27 -0700
> > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > Consider the following admittedly improbable sequence of events:
> > >
> > > o RCU is initially idle.
> > >
> > > o Task A on CPU 0 executes rcu_read_lock().
> >
> > A starts rcu_read_lock() critical section.
> >
> > >
> > > o Task B on CPU 1 executes synchronize_rcu(), which must
> > > wait on Task A:
> >
> > B waits for A.
> >
> > >
> > > o Task B registers the callback, which starts a new
> > > grace period, awakening the grace-period kthread
> > > on CPU 3, which immediately starts a new grace period.
> >
> > [ isn't B blocked (off rq)? How does it migrate? ]
>
> No, its running synchronize_rcu() but hasn't blocked yet. It would block
> on wait_for_completion(), but per the very last point, we'll observe the
> complete() before we block.
Hmm, maybe this should be stated as well, as it looked confusing.
>
> > > o Task B migrates to CPU 2, which provides a quiescent
> > > state for both CPUs 1 and 2.
> > >
> > > o Both CPUs 1 and 2 take scheduling-clock interrupts,
> > > and both invoke RCU_SOFTIRQ, both thus learning of the
> > > new grace period.
> > >
> > > o Task B is delayed, perhaps by vCPU preemption on CPU 2.
> > >
> > > o CPUs 2 and 3 pass through quiescent states, which are reported
> > > to core RCU.
> > >
> > > o Task B is resumed just long enough to be migrated to CPU 3,
> > > and then is once again delayed.
> > >
> > > o Task A executes rcu_read_unlock(), exiting its RCU read-side
> > > critical section.
> >
> > A calls rcu_read_unlock() ending the critical section
>
> The point is that rcu_read_unlock() doesn't have memory ordering.
Ah, that makes a difference. Can we state that in the change log too.
>
> > >
> > > o CPU 0 passes through a quiescent sate, which is reported to
> > > core RCU. Only CPU 1 continues to block the grace period.
> > >
> > > o CPU 1 passes through a quiescent state, which is reported to
> > > core RCU. This ends the grace period, and CPU 1 therefore
> > > invokes its callbacks, one of which awakens Task B via
> > > complete().
> > >
> > > o Task B resumes (still on CPU 3) and starts executing
> > > wait_for_completion(), which sees that the completion has
> > > already completed, and thus does not block. It returns from
> > > the synchronize_rcu() without any ordering against the
> > > end of Task A's RCU read-side critical section.
> >
> > B runs
> >
> >
> > >
> > > It can therefore mess up Task A's RCU read-side critical section,
> > > in theory, anyway.
> >
> > I don't see how B ran during A's critical section.
>
> It didn't but that doesn't mean the memory ordering agrees. What we
> require is B observes (per the memory ordering) everything up to and
> including the rcu_read_unlock(). This is not 'time' related.
OK, that makes sense. It was just missing from the change log, so mere
mortals like myself can keep up. (This is why I'm Batman. I'm only
human but I can still compete with superheros ;-)
>
>
> That said, I don't think it can actually happen, because CPU0's QS state
> is ordered against the complete and the wait_for_completion is ordered
> against the complete.
OK, but this doesn't hurt to have. Just a paranoid check, which when it
comes to RCU, leaning toward paranoid is good.
-- Steve