Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs

From: Sinan Kaya
Date: Thu Mar 22 2018 - 17:28:06 EST


On 3/22/2018 4:45 PM, Casey Leedom wrote:
> Yes, but ...
>
> For instance, I see that the x86 writel() has "memory" in its asm(), which
> prevents GCC from reordering generated instructions. And it ~looks like~
> arm64 ~sort of~ gets that with the inclusion of __iowmb() (which translates
> to wmb() then dsb(st) which finally holds the GCC "memory" barrier). Is
> this part of the documented semantic of the writel_relaxed()? The PowerPC
> stuff simply defines writel_relaxed() as writel() and I can't find the
> bottom of that Rabbit Hole ...
>

This is changing. See "RFC on writel and writel_relaxed" thread. PowerPC
maintainers are looking for a way to implement this.

What matters is the description in the barriers document. See also
section "MMIO access primitives" here about mmiowb()

https://lwn.net/Articles/697539/


> I'm guessing~ that this line in the documentation ~may~ imply the GCC
> ordering:
>
> ... Note that relaxed accesses to
> the same peripheral are guaranteed to be ordered with respect to each
> other. ...
>

This can be a compiler barrier for some arches and/or can be architecturally
guaranteed as in ARM64's device nGnRE mapping (non-gathering non-reordering with
early acknowledgment).

Both writel() and writel_relaxed() need to guarantee ordering with respect to
what HW observes for writes.

They have different guarantees regarding the code surrounding write like you
identified.

> In any case, we really only have a few places where we (the various Chelsio
> drivers) need to worry about this: the "Fast Paths" where we have a lot of
> I/O to the device. I think we should leave everything else alone.

makes sense

>
> Casey
>


--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.