Re: [PATCH v2] arch/powerpc/include/asm/barrier.h: redefine rmb and wmb to lwsync

From: Kautuk Consul
Date: Wed Feb 22 2023 - 23:08:01 EST


On 2023-02-23 14:51:25, Michael Ellerman wrote:
> Hi Paul,
>
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
> > On Wed, Feb 22, 2023 at 02:33:44PM +0530, Kautuk Consul wrote:
> >> A link from ibm.com states:
> >> "Ensures that all instructions preceding the call to __lwsync
> >> complete before any subsequent store instructions can be executed
> >> on the processor that executed the function. Also, it ensures that
> >> all load instructions preceding the call to __lwsync complete before
> >> any subsequent load instructions can be executed on the processor
> >> that executed the function. This allows you to synchronize between
> >> multiple processors with minimal performance impact, as __lwsync
> >> does not wait for confirmation from each processor."
> >>
> >> Thats why smp_rmb() and smp_wmb() are defined to lwsync.
> >> But this same understanding applies to parallel pipeline
> >> execution on each PowerPC processor.
> >> So, use the lwsync instruction for rmb() and wmb() on the PPC
> >> architectures that support it.
> >>
> >> Signed-off-by: Kautuk Consul <kconsul@xxxxxxxxxxxxxxxxxx>
> >> ---
> >> arch/powerpc/include/asm/barrier.h | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> >> index b95b666f0374..e088dacc0ee8 100644
> >> --- a/arch/powerpc/include/asm/barrier.h
> >> +++ b/arch/powerpc/include/asm/barrier.h
> >> @@ -36,8 +36,15 @@
> >> * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
> >> */
> >> #define __mb() __asm__ __volatile__ ("sync" : : : "memory")
> >> +
> >> +/* The sub-arch has lwsync. */
> >> +#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC)
> >> +#define __rmb() __asm__ __volatile__ ("lwsync" : : : "memory")
> >> +#define __wmb() __asm__ __volatile__ ("lwsync" : : : "memory")
> >
> > Hmmm...
> >
> > Does the lwsync instruction now order both cached and uncached accesses?
>
> No.
>
> > Or have there been changes so that smp_rmb() and smp_wmb() get this
> > definition, while rmb() and wmb() still get the sync instruction?
>
> No.
>
> They come from:
>
> include/asm-generic/barrier.h:#define rmb() do { kcsan_rmb(); __rmb(); } while (0)
> include/asm-generic/barrier.h:#define wmb() do { kcsan_wmb(); __wmb(); } while (0)
>
> > (Not seeing this, but I could easily be missing something.)
>
> You are correct, the patch is wrong because it fails to account for IO
> accesses.
>
> Kautuk, I'm not sure what motivated you to look at these barriers, was
> it just the documentation you linked to?
>
> Maybe we need some better documentation in the kernel explaining why we
> use the barriers we do?
>
> Although there's already a pretty decent comment above the definitions,
> but maybe it needs further clarification:
>
> /*
> * Memory barrier.
> * The sync instruction guarantees that all memory accesses initiated
> * by this processor have been performed (with respect to all other
> * mechanisms that access memory). The eieio instruction is a barrier
> * providing an ordering (separately) for (a) cacheable stores and (b)
> * loads and stores to non-cacheable memory (e.g. I/O devices).
> *
> * mb() prevents loads and stores being reordered across this point.
> * rmb() prevents loads being reordered across this point.
> * wmb() prevents stores being reordered across this point.
> *
> * *mb() variants without smp_ prefix must order all types of memory
> * operations with one another. sync is the only instruction sufficient
> * to do this.
> *
> * For the smp_ barriers, ordering is for cacheable memory operations
> * only. We have to use the sync instruction for smp_mb(), since lwsync
> * doesn't order loads with respect to previous stores. Lwsync can be
> * used for smp_rmb() and smp_wmb().

Sorry I didn't change the comment.
My point is: Is the "sync is the only instruction sufficient to do this"
comment completely correct?
As I mentioned in my reply to Paul there I didn't find any
documentation that up-front states (in the differences between sync and
lwsync) that lwsync distinguishes between cached and unchached accesses.
>
>
> cheers