Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po

From: Paul E. McKenney
Date: Sun Jan 29 2023 - 14:36:11 EST


On Sun, Jan 29, 2023 at 08:21:56AM -0800, Paul E. McKenney wrote:
> On Sun, Jan 29, 2023 at 11:03:46AM -0500, Alan Stern wrote:
> > On Sat, Jan 28, 2023 at 09:17:34PM -0800, Paul E. McKenney wrote:
> > > On Sat, Jan 28, 2023 at 05:59:52PM -0500, Alan Stern wrote:
> > > > On Sat, Jan 28, 2023 at 11:14:17PM +0100, Andrea Parri wrote:
> > > > > > Evidently the plain-coherence check rules out x=1 at the
> > > > > > end, because when I relax that check, x=1 becomes a possible result.
> > > > > > Furthermore, the graphical output confirms that this execution has a
> > > > > > ww-incoh edge from Wx=2 to Wx=1. But there is no ww-vis edge from Wx=1
> > > > > > to Wx=2! How can this be possible? It seems like a bug in herd7.
> > > > >
> > > > > By default, herd7 performs some edges removal when generating the
> > > > > graphical outputs. The option -showraw can be useful to increase
> > > > > the "verbosity", for example,
> > > > >
> > > > > [with "exists (x=2)", output in /tmp/T.dot]
> > > > > $ herd7 -conf linux-kernel.cfg T.litmus -show prop -o /tmp -skipchecks plain-coherence -doshow ww-vis -showraw ww-vis
> > > >
> > > > Okay, thanks, that helps a lot.
> > > >
> > > > So here's what we've got. The litmus test:
> > > >
> > > >
> > > > C hb-and-int
> > > > {}
> > > >
> > > > P0(int *x, int *y)
> > > > {
> > > > *x = 1;
> > > > smp_store_release(y, 1);
> > > > }
> > > >
> > > > P1(int *x, int *y, int *dx, int *dy, spinlock_t *l)
> > > > {
> > > > spin_lock(l);
> > > > int r1 = READ_ONCE(*dy);
> > > > if (r1==1)
> > > > spin_unlock(l);
> > > >
> > > > int r0 = smp_load_acquire(y);
> > > > if (r0 == 1) {
> > > > WRITE_ONCE(*dx,1);
> > > > }
> > >
> > > The lack of a spin_unlock() when r1!=1 is intentional?
> >
> > I assume so.
> >
> > > It is admittedly a cute way to prevent P3 from doing anything
> > > when r1!=1. And P1 won't do anything if P3 runs first.
> >
> > Right.
> >
> > > > }
> > > >
> > > > P2(int *dx, int *dy)
> > > > {
> > > > WRITE_ONCE(*dy,READ_ONCE(*dx));
> > > > }
> > > >
> > > >
> > > > P3(int *x, spinlock_t *l)
> > > > {
> > > > spin_lock(l);
> > > > smp_mb__after_unlock_lock();
> > > > *x = 2;
> > > > }
> > > >
> > > > exists (x=2)
> > > >
> > > >
> > > > The reason why Wx=1 ->ww-vis Wx=2:
> > > >
> > > > 0:Wx=1 ->po-rel 0:Wy=1 and po-rel < fence < ww-post-bounded.
> > > >
> > > > 0:Wy=1 ->rfe 1:Ry=1 ->(hb* & int) 1:Rdy=1 and
> > > > (rfe ; hb* & int) <= (rfe ; xbstar & int) <= vis.
> > > >
> > > > 1:Rdy=1 ->po 1:unlock ->rfe 3:lock ->po 3:Wx=2
> > > > so 1:Rdy=1 ->po-unlock-lock-po 3:Wx=2
> > > > and po-unlock-lock-po <= mb <= fence <= w-pre-bounded.
> > > >
> > > > Finally, w-post-bounded ; vis ; w-pre-bounded <= ww-vis.
> > > >
> > > > This explains why the memory model says there isn't a data race. This
> > > > doesn't use the smp_mb__after_unlock_lock at all.
> > >
> > > You lost me on this one.
> > >
> > > Suppose that P3 starts first, then P0. P1 is then stuck at the
> > > spin_lock() because P3 does not release that lock. P2 goes out for a
> > > pizza.
> >
> > That wouldn't be a valid execution. One of the rules in lock.cat says
> > that a spin_lock() call must read from a spin_unlock() or from an
> > initial write, which rules out executions in which P3 acquires the lock
> > first.
>
> OK, I will bite...
>
> Why can't P3's spin_lock() read from that initial write?
>
> > > Why can't the two stores to x by P0 and P3 conflict, resulting in a
> > > data race?
> >
> > That can't happen in executions where P1 acquires the lock first for the
> > reason outlined above (P0's store to x propagates to P3 before P3 writes
> > to x). And there are no other executions -- basically, herd7 ignores
> > deadlock scenarios.
>
> True enough, if P1 gets there first, then P3 never stores to x.
>
> What I don't understand is why P1 must always get there first.

Ah, is the rule that all processes must complete?

If so, then the only way all processes can complete is if P1 loads 1
from dy, thus releasing the lock.

But that dy=1 load can only happen after P2 has copied dx to dy, and has
stored a 1. Which in turn only happens if P1's write to dx is ordered
before the lock release. Which only executes if P1's load from y returned
1, which only happens after P0 stored 1 to y.

Which means that P3 gets the lock only after P0 completes its plain
write to x, so there is no data race.

The reason that P3 cannot go first is that this will prevent P1 from
completing, in turn preventing herd7 from counting that execution.

Or am I still missing a turn in there somewhere?

Thanx, Paul