Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks

From: Will Deacon
Date: Fri Jun 22 2018 - 06:38:13 EST


Hi Peter,

On Fri, Jun 22, 2018 at 12:31:29PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 22, 2018 at 10:55:47AM +0100, Will Deacon wrote:
> > On Fri, Jun 22, 2018 at 09:09:28AM +0100, Will Deacon wrote:
> > > On Thu, Jun 21, 2018 at 01:27:12PM -0400, Alan Stern wrote:
> > > > More than one kernel developer has expressed the opinion that the LKMM
> > > > should enforce ordering of writes by release-acquire chains and by
> > > > locking. In other words, given the following code:
> > > >
> > > > WRITE_ONCE(x, 1);
> > > > spin_unlock(&s):
> > > > spin_lock(&s);
> > > > WRITE_ONCE(y, 1);
>
> So this is the one I'm relying on and really want sorted.

Agreed, and I think this one makes a lot of sense.

>
> > > > or the following:
> > > >
> > > > smp_store_release(&x, 1);
> > > > r1 = smp_load_acquire(&x); // r1 = 1
> > > > WRITE_ONCE(y, 1);
>
> Reading back some of the old threads [1], it seems the direct
> translation of the first into acquire-release would be:
>
> WRITE_ONCE(x, 1);
> smp_store_release(&s, 1);
> r1 = smp_load_acquire(&s);
> WRITE_ONCE(y, 1);
>
> Which is I think easier to make happen than the second example you give.

It's easier, but it will still break on architectures with native support
for RCpc acquire/release. For example, using LDAPR again:


AArch64 MP+popl-rfilq-poqp+poap
"PodWWPL RfiLQ PodRWQP RfePA PodRRAP Fre"
Generator=diyone7 (version 7.46+3)
Prefetch=0:x=F,0:z=W,1:z=F,1:x=T
Com=Rf Fr
Orig=PodWWPL RfiLQ PodRWQP RfePA PodRRAP Fre
{
0:X1=x; 0:X3=y; 0:X6=z;
1:X1=z; 1:X3=x;
}
P0 | P1 ;
MOV W0,#1 | LDAR W0,[X1] ;
STR W0,[X1] | LDR W2,[X3] ;
MOV W2,#1 | ;
STLR W2,[X3] | ;
LDAPR W4,[X3] | ;
MOV W5,#1 | ;
STR W5,[X6] | ;
exists
(0:X4=1 /\ 1:X0=1 /\ 1:X2=0)


then this is permitted on arm64.

> > > > the stores to x and y should be propagated in order to all other CPUs,
> > > > even though those other CPUs might not access the lock s or be part of
> > > > the release-acquire chain. In terms of the memory model, this means
> > > > that rel-rf-acq-po should be part of the cumul-fence relation.
> > > >
> > > > All the architectures supported by the Linux kernel (including RISC-V)
> > > > do behave this way, albeit for varying reasons. Therefore this patch
> > > > changes the model in accordance with the developers' wishes.
> > >
> > > Interesting...
> > >
> > > I think the second example would preclude us using LDAPR for load-acquire,
> > > so I'm surprised that RISC-V is ok with this. For example, the first test
> > > below is allowed on arm64.
> > >
> > > I also think this would break if we used DMB LD to implement load-acquire
> > > (second test below).
> > >
> > > So I'm not a big fan of this change, and I'm surprised this works on all
> > > architectures. What's the justification?
> >
> > I also just realised that this prevents Power from using ctrl+isync to
> > implement acquire, should they wish to do so.
>
> They in fact do so on chips lacking LWSYNC, see how PPC_ACQUIRE_BARRIER
> (as used by atomic_*_acquire) turns into ISYNC (note however that they
> do not use PPC_ACQUIRE_BARRIER for smp_load_acquire -- because there's
> no CTRL there).

Right, so the example in the commit message is broken on PPC then. I think
it's also broken on RISC-V, despite the claim.

Could we drop the acquire/release stuff from the patch and limit this change
to locking instead?

Will