Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle statemachine

From: Paul E. McKenney
Date: Thu Jul 18 2013 - 12:49:09 EST


On Thu, Jul 18, 2013 at 04:24:51PM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 17, 2013 at 08:39:21PM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 18, 2013 at 03:33:01AM +0200, Frederic Weisbecker wrote:
> > > So it's like:
> > >
> > > CPU 0 CPU 1
> > >
> > > read I write I
> > > smp_mb() smp_mb()
> > > cmpxchg S read S
> > >
> > > I still can't find what guarantees we don't read a value in CPU 1 that is way below
> > > what we want.
> >
> > One key point is that there is a second cycle from LONG to FULL.
> >
> > (Not saying that there is not a bug -- there might well be. In fact,
> > I am starting to think that I need to do another Promela model...
>
> Now I'm very confused :)

To quote a Nobel Laureate who presented at an ISEF here in Portland some
years back, "Confusion is the most productive state of mind." ;-)

> I'm far from being a specialist on these matters but I would really love to
> understand this patchset. Is there any documentation somewhere I can read
> that could help, something about cycles of committed memory or something?

Documentation/memory-barriers.txt should suffice for this. If you want
more rigor, http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

But memory-barrier pairing suffices here. Here is case 2 from my
earlier email in more detail. The comments with capital letters
mark important memory barriers, some of which are buried in atomic
operations.

1. Some CPU coming out of idle:

o rcu_sysidle_exit():

smp_mb__before_atomic_inc();
atomic_inc(&rdtp->dynticks_idle);
smp_mb__after_atomic_inc(); /* A */

o rcu_sysidle_force_exit():

oldstate = ACCESS_ONCE(full_sysidle_state);

2. RCU GP kthread:

o rcu_sysidle():

cmpxchg(&full_sysidle_state, RCU_SYSIDLE_SHORT, RCU_SYSIDLE_LONG);
/* B */

o rcu_sysidle_check_cpu():

cur = atomic_read(&rdtp->dynticks_idle);

Memory barrier A pairs with memory barrier B, so that if #1's load
from full_sysidle_state sees RCU_SYSIDLE_SHORT, we know that #1's
atomic_inc() must be visible to #2's atomic_read(). This will cause #2
to recognize that the CPU came out of idle, which will in turn cause it
to invoke rcu_sysidle_cancel() instead of rcu_sysidle(), resulting in
full_sysidle_state being set to RCU_SYSIDLE_NOT.

Thanx, Paul

> > > > Unfortunately, the reasoning in #2 above does not hold in the small-CPU
> > > > case because there is the possibility of both the timekeeping CPU and
> > > > the RCU grace-period kthread concurrently advancing the state machine.
> > > > This would be bad, good catch!!!
> > >
> > > It's not like I spotted anything myself but you're welcome :)
> >
> > I will take them any way I can get them. ;-)
> >
> > > > The patch below (untested) is an attempt to fix this. If it actually
> > > > works, I will merge it in with 6/7.
> > > >
> > > > Anything else I missed? ;-)
> > >
> > > Well I guess I'll wait one more night before trying to understand
> > > the below ;)
> >
> > The key point is that the added check means that either the timekeeping
> > CPU is advancing the state machine (if there are few CPUs) or the
> > RCU grace-period kthread is (if there are many CPUs), but never both.
> > Or that is the intent, anyway!
>
> Yeah got that.
>
> Thanks!
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/