Re: [PATCH] barriers: introduce smp_mb__release_acquire and update documentation

From: Will Deacon
Date: Wed Sep 16 2015 - 06:29:19 EST


Hi Paul, Peter,

Thanks for the comments. More below...

On Wed, Sep 16, 2015 at 10:14:52AM +0100, Peter Zijlstra wrote:
> On Tue, Sep 15, 2015 at 10:47:24AM -0700, Paul E. McKenney wrote:
> > > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > > index 0eca6efc0631..919624634d0a 100644
> > > --- a/arch/powerpc/include/asm/barrier.h
> > > +++ b/arch/powerpc/include/asm/barrier.h
> > > @@ -87,6 +87,7 @@ do { \
> > > ___p1; \
> > > })
> > >
> > > +#define smp_mb__release_acquire() smp_mb()
> >
> > If we are handling locking the same as atomic acquire and release
> > operations, this could also be placed between the unlock and the lock.
>
> I think the point was exactly that we need to separate LOCK/UNLOCK from
> ACQUIRE/RELEASE.

Yes, pending the PPC investigation, I'd like to keep this separate for
now.

> > However, independently of the unlock/lock case, this definition and
> > use of smp_mb__release_acquire() does not handle full ordering of a
> > release by one CPU and an acquire of that same variable by another.
>
> > In that case, we need roughly the same setup as the much-maligned
> > smp_mb__after_unlock_lock(). So, do we care about this case? (RCU does,
> > though not 100% sure about any other subsystems.)
>
> Indeed, that is a hole in the definition, that I think we should close.

I'm struggling to understand the hole, but here's my intuition. If an
ACQUIRE on CPUx reads from a RELEASE by CPUy, then I'd expect CPUx to
observe all memory accessed performed by CPUy prior to the RELEASE
before it observes the RELEASE itself, regardless of this new barrier.
I think this matches what we currently have in memory-barriers.txt (i.e.
acquire/release are neither transitive or multi-copy atomic).

Do we have use-cases that need these extra guarantees (outside of the
single RCU case, which is using smp_mb__after_unlock_lock)? I'd rather
not augment smp_mb__release_acquire unless we really have to, so I'd
prefer to document that it only applies when the RELEASE and ACQUIRE are
performed by the same CPU. Thoughts?

> > > #define smp_mb__before_atomic() smp_mb()
> > > #define smp_mb__after_atomic() smp_mb()
> > > #define smp_mb__before_spinlock() smp_mb()
> > > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> > > index 0681d2532527..1c61ad251e0e 100644
> > > --- a/arch/x86/include/asm/barrier.h
> > > +++ b/arch/x86/include/asm/barrier.h
> > > @@ -85,6 +85,8 @@ do { \
> > > ___p1; \
> > > })
> > >
> > > +#define smp_mb__release_acquire() smp_mb()
> > > +
> > > #endif
> > >
>
> All TSO archs would want this.

If we look at all architectures that implement smp_store_release without
an smp_mb already, we get:

ia64
powerpc
s390
sparc
x86

so it should be enough to provide those with definitions. I'll do that
once we've settled on the documentation bits.

> > > /* Atomic operations are already serializing on x86 */
> > > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> > > index b42afada1280..61ae95199397 100644
> > > --- a/include/asm-generic/barrier.h
> > > +++ b/include/asm-generic/barrier.h
> > > @@ -119,5 +119,9 @@ do { \
> > > ___p1; \
> > > })
> > >
> > > +#ifndef smp_mb__release_acquire
> > > +#define smp_mb__release_acquire() do { } while (0)
> >
> > Doesn't this need to be barrier() in the case where one variable was
> > released and another was acquired?
>
> Yes, I think its very prudent to never let any barrier degrade to less
> than barrier().

Hey, I just copied read_barrier_depends from the same file! Both
smp_load_acquire and smp_store_release should already provide at least
barrier(), so the above should be sufficient.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/