Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser
Date: Fri Jan 27 2023 - 09:32:21 EST




On 1/26/2023 5:36 PM, Alan Stern wrote:
On Thu, Jan 26, 2023 at 02:46:04PM +0100, Jonas Oberhauser wrote:
As stated in the documentation and implied by its name, the ppo
(preserved program order) relation is intended to link po-earlier
to po-later instructions under certain conditions. However, a
corner case currently allows instructions to be linked by ppo that
are not executed by the same thread, i.e., instructions are being
linked that have no po relation.

This happens due to the mb/strong-fence relations, which (as one
case) provide order when locks are passed between threads followed
by an smp_mb__after_unlock_lock() fence. This is illustrated in
the following litmus test (as can be seen when using herd7 with
`doshow ppo`):

P0(int *x, int *y)
{
spin_lock(x);
spin_unlock(x);
}

P1(int *x, int *y)
{
spin_lock(x);
smp_mb__after_unlock_lock();
*y = 1;
}

The ppo relation will link P0's spin_lock(x) and P1's *y=1, because
P0 passes a lock to P1 which then uses this fence.

The patch makes ppo a subrelation of po by eliminating this possibility
from mb (but not strong-fence) and relying explicitly on mb|gp instead
of strong-fence when defining ppo.

Signed-off-by: Jonas Oberhauser <jonas.oberhauser@xxxxxxxxxxxxxxx>
---
This changes the meaning of the fence relation, which is used in
w-pre-bounded, w-post-bounded, ww-vis, wr-vis, and rw-xbstar. Have you
checked that they won't be affected by the change?

Good point, in fact it changes the nonrw-fence as well which is used in the r-* relations.
I had missed this completely. That's what I get for not looking at the data race relations!


Let's go through the individual cases.
If we had e.g. *-post-bounded because of the po-unlock-lock-po;[After-unlock-lock];po edge, this may be either due to the fence rule or due to (...fence ; vis ; *-pre-bounded).
In the first case we have po-rel ; rfe ; acq-po and get fence;rfe;(xbstar&int);fence which gives us *-post-bounded.
In the second case we now have strong-fence, with vis <= xbstar we see that **-vis is preserved here by switching from the fence rule to the strong-fence;xbstar;... case.

For *-pre-bounded, the situtation is tricky because of the xbstar&int that can be at the end of vis, when *-pre-bounded is used to define w*-vis through (w-post-bounded;vis;*-pre-bounded). In this case, the xbstar&int can point in the opposite direction of po, which means that the unlock that creates the strong fence might not be po-after the instruction that starts the link.

Here's a litmus test illustrating the difference, where P1 has a backwards-pointing xbstar&int. Currently there's no data race, but with the proposed patch there is.

P0(int *x, int *y)
{
    *x = 1;
    smp_store_release(y, 1);
}

P1(int *x, int *y, int *dx, int *dy, spinlock_t *l)
{
    spin_lock(l);
    int r1 = READ_ONCE(*dy);
    if (r1==1)
        spin_unlock(l);

    int r0 = smp_load_acquire(y);
    if (r0 == 1) {
        WRITE_ONCE(*dx,1);
    }
}

P2(int *dx, int *dy)
{
    WRITE_ONCE(*dy,READ_ONCE(*dx));
}


P3(int *x, spinlock_t *l)
{
    spin_lock(l);
    smp_mb__after_unlock_lock();
    *x = 2;
}


This actually makes me wonder. I thought the reason for the xbstar & int is that it ensures that the overall relation, after shuffling around a little bit, becomes prop&int ; hb*.

Like in case the *x=2 is co-before the *x=1, we get
  Wx2 ->overwrite Wx1 ->cumul-fence*;rfe  (some event on the same CPU as Wx2)  ->fence Wx2
which is
  Wx2 -> prop&int some other event ->hb Wx2
which must be irreflexive.

However, that's not the case at all, because the fence relation currently doesn't actually have to relate events of the same CPU.
So we get
  Wx2 ->overwrite Wx1 ->cumul-fence*;rfe  (some event on some other CPU than Wx2's) ->(hb*&int);fence Wx2
i.e.,
  Wx2 ->prop&ext;hb*;strong-fence Wx2

which shouldn't provide any ordering in general.

In fact, replacing the *x=1 and *x=2 with WRITE_ONCEs, (pilot errors notwithstanding) both Wx1 ->co Wx2 and Wx2 ->co Wx1 become allowed in the current LKMM (in graphs where all other edges are equal).

Shouldn't this actually *be* a data race? And potentially the same with rcu-fence?

The other cases of *-pre-bounded seem to work out: they all link stuff via xbstar to the instruction that is linked via po-unlock-lock-po ; [After-unlock-lock] ; po to the potentially racy access, and po-unlock-lock-po;po   is xbstar ; acq-po, which allows closing the gap.

Best wishes, jonas