Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode

From: Alexey Kardashevskiy
Date: Sat Sep 05 2020 - 11:45:38 EST




On 31/08/2020 16:40, Christoph Hellwig wrote:
On Sun, Aug 30, 2020 at 11:04:21AM +0200, Cédric Le Goater wrote:
Hello,

On 7/8/20 5:24 PM, Christoph Hellwig wrote:
Use the DMA API bypass mechanism for direct window mappings. This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all. It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
---
arch/powerpc/Kconfig | 1 +
arch/powerpc/include/asm/device.h | 5 --
arch/powerpc/kernel/dma-iommu.c | 90 ++++---------------------------
3 files changed, 10 insertions(+), 86 deletions(-)

I am seeing corruptions on a couple of POWER9 systems (boston) when
stressed with IO. stress-ng gives some results but I have first seen
it when compiling the kernel in a guest and this is still the best way
to raise the issue.

These systems have of a SAS Adaptec controller :

0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01)

When the failure occurs, the POWERPC EEH interrupt fires and dumps
lowlevel PHB4 registers among which :

[ 2179.251069490,3] PHB#0003[0:3]: phbErrorStatus = 0000028000000000
[ 2179.251117476,3] PHB#0003[0:3]: phbFirstErrorStatus = 0000020000000000

The bits raised identify a PPC 'TCE' error, which means it is related
to DMAs. See below for more details.


Reverting this patch "fixes" the issue but it is probably else where,
in some other layers or in the aacraid driver. How should I proceed
to get more information ?

The aacraid DMA masks look like a mess.


It kinds does and is. The thing is that after f1565c24b596 the driver sets 32 bit DMA mask which in turn enables the small DMA window (not bypass) and since the aacraid driver has at least one bug with double unmap of the same DMA handle, this somehow leads to EEH (PCI DMA error).


The driver sets 32but mask because it callis dma_get_required_mask() _before_ setting the mask so dma_get_required_mask() does not go the dma_alloc_direct() path and calls the powerpc's dma_iommu_get_required_mask() which:

1. does the math like this (spot 2 bugs):

mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1)

2. but even after fixing that, the driver crashes as f1565c24b596 removed the call to dma_iommu_bypass_supported() so it enforces IOMMU.


The patch below (the first hunk to be precise) brings the things back to where they were (64bit mask). The double unmap bug in the driver is still to be investigated.



diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 569fecd7b5b2..785abccb90fc 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -117,10 +117,18 @@ u64 dma_iommu_get_required_mask(struct device *dev)
struct iommu_table *tbl = get_iommu_table_base(dev);
u64 mask;

+ if (dev_is_pci(dev)) {
+ u64 bypass_mask = dma_direct_get_required_mask(dev);
+
+ if (dma_iommu_bypass_supported(dev, bypass_mask))
+ return bypass_mask;
+ }
+
if (!tbl)
return 0;

- mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
+ mask = 1ULL << (fls_long(tbl->it_offset + tbl->it_size) +
+ tbl->it_page_shift - 1);
mask += mask - 1;

return mask;



--
Alexey