Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus test)

From: Jonas Oberhauser
Date: Tue Jan 24 2023 - 14:31:57 EST




On 1/24/2023 6:26 PM, Paul E. McKenney wrote:
On Tue, Jan 24, 2023 at 05:39:53PM +0100, Jonas Oberhauser wrote:

On 1/24/2023 5:22 PM, Paul E. McKenney wrote:
I clearly recall some
store-based lack of ordering after a grace period from some years back,
and am thus far failing to reproduce it.

And here is another attempt that herd7 actually does allow.

So what did I mess up this time? ;-)

Thanx, Paul

------------------------------------------------------------------------

C C-srcu-observed-4

(*
* Result: Sometimes
*
* The Linux-kernel implementation is suspected to forbid this.
*)

{}

P0(int *x, int *y, int *z, struct srcu_struct *s)
{
int r1;

r1 = srcu_read_lock(s);
WRITE_ONCE(*y, 2);
WRITE_ONCE(*x, 1);
srcu_read_unlock(s, r1);
}

P1(int *x, int *y, int *z, struct srcu_struct *s)
{
int r1;

WRITE_ONCE(*y, 1);
synchronize_srcu(s);
WRITE_ONCE(*z, 2);
}

P2(int *x, int *y, int *z, struct srcu_struct *s)
{
WRITE_ONCE(*z, 1);
smp_store_release(x, 2);
}

exists (x=1 /\ y=1 /\ z=1)
I think even if you implement the unlock as mb() followed by some store that
is read by the gp between mb()s, this would still be allowed.
The implementation of synchronize_srcu() has quite a few smp_mb()
invocations.

But exactly how are you modeling this? As in what additional accesses
and memory barriers are you placing in which locations?

Along these lines:

P0(int *x, int *y, int *z, int *magic_location)
{
int r1;


WRITE_ONCE(*y, 2);
WRITE_ONCE(*x, 1);

smp_mb();
WRITE_ONCE(*magic_location, 1);

}

P1(int *x, int *y, int *z, int *magic_location)
{
int r1;

WRITE_ONCE(*y, 1);

smp_mb();
while (! READ_ONCE(*magic_location))
;
smp_mb();
WRITE_ONCE(*z, 2);
}


P2(int *x, int *y, int *z, struct srcu_struct *s)
{
WRITE_ONCE(*z, 1);
smp_store_release(x, 2);
}



Note that you can add as many additional smp_mb() and other accesses as you want around the original srcu call sites. I don't see how they could influence the absence of a cycle.

(Also, to make it work with herd it seems you need to replace the loop with a single read and state in the exists clause that it happens to read a 1.)

I have already forgotten the specifics, but I think the power model allows
certain stores never propagating somewhere?
PowerPC would forbid the 3.2W case, where each process used an
smp_store_release() as its sole ordering (no smp_mb() calls at all).

[...]

This propagation is modulated by the memory barriers, though.

Ah, looking at the model now. Indeed it's forbidden, because in order to say that something is in co, there must not be a (resulting) cycle of co and barriers. But you'd get that here.  In the axiomatic model, this corresponds to saying Power's "prop | co" is acyclic. The same isn't true in LKMM. So that's probably why.

Have fun, jonas