Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

From: Paul E. McKenney
Date: Thu Jan 28 2016 - 17:31:51 EST


On Thu, Jan 28, 2016 at 09:57:19AM +0000, Will Deacon wrote:
> On Wed, Jan 27, 2016 at 03:38:36PM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 27, 2016 at 03:21:58PM +0000, Will Deacon wrote:
> > > On Wed, Jan 27, 2016 at 03:54:21PM +0100, Peter Zijlstra wrote:
> > > > On Wed, Jan 27, 2016 at 11:43:48AM +0000, Will Deacon wrote:
> > > > > Do you know whether a SYNC 18 (RELEASE) followed in program order by a
> > > > > SYNC 17 (ACQUIRE) creates a full barrier (i.e. something like SYNC 16)?
> > > > >
> > > > > If not, you may need to implement smp_mb__after_unlock_lock for RCU
> > > > > to ensure globally transitive unlock->lock ordering should you decide
> > > > > to relax your locking barriers.
> > > >
> > > > You know that is a tricky question. Maybe its easier if you give the 3
> > > > cpu litmus test that goes with it.
> > >
> > > Sure, I was building up to that. I just wanted to make sure the basics
> > > were there (program-order, so same CPU) before we go any further. It
> > > sounds like they are, so that's promising.
> > >
> > > > Maciej, the tricky point is what, if any, effect the
> > > > SYNC_RELEASE+SYNC_ACQUIRE pair has on an unrelated CPU. Please review
> > > > the TRANSITIVITY section in Documentation/memory-barriers.txt and
> > > > replace <general barrier> with the RELEASE+ACQUIRE pair.
> > > >
> > > > We've all (Will, Paul and me) had much 'fun' trying to decipher the
> > > > MIPS64r6 manual but failed to reach a conclusion on this.
> > >
> > > For the inter-thread case, Paul had a previous example along the lines
> > > of:
> > >
> > >
> > > Wx=1
> > > WyRel=1
> > >
> > > RyAcq=1
> > > Rz=0
> > >
> > > Wz=1
> > > smp_mb()
> > > Rx=0
> >
> > Each paragraph being a separate thread, correct? If so, agreed.
>
> Yes, sorry for the shorthand:
>
> - Each paragraph is a separate thread
> - Wx=1 means WRITE_ONCE(x, 1), Rx=1 means READ_ONCE(x) returns 1
> - WxRel means smp_store_release(x,1), RxAcq=1 means smp_load_acquire(x)
> returns 1
> - Everything is initially zero
>
> > > and I suppose a variant of that:
> > >
> > >
> > > Wx=1
> > > WyRel=1
> > >
> > > RyAcq=1
> > > Wz=1
> > >
> > > Rz=1
> > > <address dependency>
> > > Rx=0
> >
> > Agreed, this would be needed as well, along with the read-read and
> > read-write variants. I picked the write-read version (Will's first
> > test above) because write-read reordering is the most likely on
> > hardware that I am aware of.
>
> Question: if you replaced "Wz=1" with "WzRel=1" in my second test, would
> it then be forbidden?

On Power, yes. I would guess on ARM as well.

For Linux in general, this is a question: How strict do we want to be
about matching the type of write with the corresponding read? My
default approach is to initially be quite strict and loosen as needed.
Here "quite strict" might mean requiring an rcu_assign_pointer() for
the write and rcu_dereference() for the read, as opposed to (say)
ACCESS_ONCE() for the read. (I am guessing that this would be too
tight, but it makes a good example.)

Thoughts?

Thanx, Paul