Re: [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

From: Jonathan Marek
Date: Tue Oct 06 2020 - 09:21:11 EST


On 10/6/20 3:23 AM, Christoph Hellwig wrote:
On Mon, Oct 05, 2020 at 10:35:43AM -0400, Jonathan Marek wrote:
The cache synchronization doesn't have anything to do with IOMMU (for
example: cache synchronization would be useful in cases where drm/msm
doesn't use IOMMU).

It has to do with doing DMA. And we have two frameworks for doing DMA:
either the DMA API which is for general driver use, and which as part of
the design includes cache maintainance hidden behind the concept of
ownership transfers. And we have the much more bare bones IOMMU API.

If people want to use the "raw" IOMMU API with not cache coherent
devices we'll need a cache maintainance API that goes along with it.
It could either be formally part of the IOMMU API or be separate.

What is needed is to call arch_sync_dma_for_{cpu,device} (which is what I
went with initially, but then decided to re-use drm/msm's
sync_for_{cpu,device}). But you are also saying those functions aren't for
driver use, and I doubt IOMMU maintainers will want to add wrappers for
these functions just to satisfy this "not for driver use" requirement.

arch_sync_dma_for_{cpu,device} are low-level helpers (and not very
great ones at that). The definitively should not be used by drivers.
They would be very useful buildblocks for a IOMMU cache maintainance
API.

Of course the best outcome would be if we could find a way for the MSM
drm driver to just use DMA API and not deal with the lower level
abstractions. Do you remember why the driver went for use of the IOMMU
API?


One example why drm/msm can't use DMA API is multiple page table support (that is landing in 5.10), which is something that definitely couldn't work with DMA API.

Another one is being able to choose the address for mappings, which AFAIK DMA API can't do (somewhat related to this: qcom hardware often has ranges of allowed addresses, which the dma_mask mechanism fails to represent, what I see is drivers using dma_mask as a "maximum address", and since addresses are allocated from the top it generally works)

But let us imagine drm/msm switches to using DMA API. a2xx GPUs have their own very basic MMU (implemented by msm_gpummu.c), that will need to implement dma_map_ops, which will have to call arch_sync_dma_for_{cpu,device}. So drm/msm still needs to call arch_sync_dma_for_{cpu,device} in that scenario.