Re: [PATCH] arm64: mm: take CWG into account in __inval_cache_range()

From: Ard Biesheuvel
Date: Tue Apr 19 2016 - 10:48:38 EST


On 19 April 2016 at 16:13, Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> On Tue, Apr 19, 2016 at 03:08:27PM +0200, Ard Biesheuvel wrote:
>> On 19 April 2016 at 14:56, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>> > Is sharing a CWG broken by design, or is there some caveat I'm missing
>> > that prevents/prohibits unrelated data from being dirtied?
>>
>> I think sharing a CWG window is broken by design, now that I think of
>> it. The invalidate is part of the dma_unmap() code path, which means
>> the cleaning we do on the edges of the buffer may clobber data in
>> memory written by the device, and not cleaning isn't an option either.
>
> Indeed. It only works if the cache line is always clean (e.g. read or
> speculatively loaded) but you can't guarantee what's accessing it. I
> think we shouldn't even bother with the edges at all in __dma_inv_range.
>

Agreed.

> The best we could do is to warn if ARCH_DMA_MINALIGN is smaller than CWG
> (as Robin suggested, we could do this only if we have non-coherent DMA
> masters via arch_setup_dma_ops()). Quick hack below:
>
> -------------------------------8<-----------------------------------
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index 5082b30bc2c0..5967fcbb617a 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -28,7 +28,7 @@
> * cache before the transfer is done, causing old data to be seen by
> * the CPU.
> */
> -#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> +#define ARCH_DMA_MINALIGN 128
>
> #ifndef __ASSEMBLY__
>
> @@ -37,7 +37,7 @@
> static inline int cache_line_size(void)
> {
> u32 cwg = cache_type_cwg();
> - return cwg ? 4 << cwg : L1_CACHE_BYTES;
> + return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;

Unrelated, but this does not look right: if the CWG field is zero, we
should either assume 2 KB, or iterate over all the CCSIDR values and
take the maximum linesize.

> }
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 943f5140e0f3..b1d4d1f5b0dd 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -970,7 +970,6 @@ static void __init setup_feature_capabilities(void)
> void __init setup_cpu_features(void)
> {
> u32 cwg;
> - int cls;
>
> /* Set the CPU feature capabilies */
> setup_feature_capabilities();
> @@ -983,13 +982,9 @@ void __init setup_cpu_features(void)
> * Check for sane CTR_EL0.CWG value.
> */
> cwg = cache_type_cwg();
> - cls = cache_line_size();
> if (!cwg)
> - pr_warn("No Cache Writeback Granule information, assuming cache line size %d\n",
> - cls);
> - if (L1_CACHE_BYTES < cls)
> - pr_warn("L1_CACHE_BYTES smaller than the Cache Writeback Granule (%d < %d)\n",
> - L1_CACHE_BYTES, cls);
> + pr_warn("No Cache Writeback Granule information, assuming %d\n",
> + ARCH_DMA_MINALIGN);
> }
>
> static bool __maybe_unused
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index a6e757cbab77..08232faf38ad 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -987,6 +987,11 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> struct iommu_ops *iommu, bool coherent)
> {
> + WARN_TAINT_ONCE(!coherent && ARCH_DMA_MINALIGN < cache_line_size(),
> + TAINT_CPU_OUT_OF_SPEC,
> + "ARCH_DMA_MINALIGN smaller than the Cache Writeback Granule (%d < %d)",
> + ARCH_DMA_MINALIGN, cache_line_size());
> +
> if (!dev->archdata.dma_ops)
> dev->archdata.dma_ops = &swiotlb_dma_ops;
>
> -------------------------------8<-----------------------------------
>
> This way we could also reduce L1_CACHE_SHIFT back to 6 as it shouldn't
> cause any correctness issues (potentially performance but so far it
> seems that increasing it made things worse on some SoCs).
>
> The reason for cache_line_size() using ARCH_DMA_MINALIGN instead of
> L1_CACHE_BYTES is that I can see it used in the sl*b code when
> SLAB_HWCACHE_ALIGN is passed and on few occasions for DMA buffers.
>
> --
> Catalin