Re: [PATCH v2 01/21] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section

From: Benjamin Herrenschmidt
Date: Mon Apr 15 2019 - 00:07:21 EST


On Fri, 2019-04-12 at 14:17 +0100, Will Deacon wrote:
>
> + the same CPU thread to a particular device will arrive in program
> + order.
> +
> + 2. A writeX() by a CPU thread to the peripheral will first wait for the
> + completion of all prior writes to memory either issued by the thread
> + or issued while holding a spinlock that was subsequently taken by the
> + thread. This ensures that writes by the CPU to an outbound DMA
> + buffer allocated by dma_alloc_coherent() will be visible to a DMA
> + engine when the CPU writes to its MMIO control register to trigger
> + the transfer.

Not particularily trying to be annoying here but I find the above
rather hard to parse :) I know what you're getting at but I'm not sure
somebody who doesn't will understand.

One way would be to instead prefix the whole thing with a blurb along
the lines of:

readX() and writeX() provide some ordering guarantees versus
each other and other memory accesses that are described below.
Those guarantees apply to accesses performed either by the same
logical thread of execution, or by different threads but while
holding the same lock (spinlock or mutex).

Then have as simpler description of each case. No ?

> + 3. A readX() by a CPU thread from the peripheral will complete before
> + any subsequent reads from memory by the same thread can begin. This
> + ensures that reads by the CPU from an incoming DMA buffer allocated
> + by dma_alloc_coherent() will not see stale data after reading from
> + the DMA engine's MMIO status register to establish that the DMA
> + transfer has completed.
> +
> + 4. A readX() by a CPU thread from the peripheral will complete before
> + any subsequent delay() loop can begin execution on the same thread.
> + This ensures that two MMIO register writes by the CPU to a peripheral
> + will arrive at least 1us apart if the first write is immediately read
> + back with readX() and udelay(1) is called prior to the second
> + writeX():
>
> writel(42, DEVICE_REGISTER_0); // Arrives at the device...
> readl(DEVICE_REGISTER_0);
> @@ -2600,8 +2604,10 @@ guarantees:
> These will perform appropriately for the type of access they're actually
> doing, be it inX()/outX() or readX()/writeX().
>
> -All of these accessors assume that the underlying peripheral is little-endian,
> -and will therefore perform byte-swapping operations on big-endian architectures.
> +With the exception of the string accessors (insX(), outsX(), readsX() and
> +writesX()), all of the above assume that the underlying peripheral is
> +little-endian and will therefore perform byte-swapping operations on big-endian
> +architectures.
>
>
> ========================================