Re: [PATCH 1/2] drm: add cache support for arm64

From: Christoph Hellwig
Date: Fri Aug 09 2019 - 04:15:02 EST


On Thu, Aug 08, 2019 at 01:58:08PM +0200, Daniel Vetter wrote:
> > > We use shmem to get at swappable pages. We generally just assume that
> > > the gpu can get at those pages, but things fall apart in fun ways:
> > > - some setups somehow inject bounce buffers. Some drivers just give
> > > up, others try to allocate a pool of pages with dma_alloc_coherent.
> > > - some devices are misdesigned and can't access as much as the cpu. We
> > > allocate using GFP_DMA32 to fix that.
> >
> > Well, for shmem you can't really call allocators directly, right?
>
> We can pass gfp flags to shmem_read_mapping_page_gfp, which is just about
> enough for the 2 cases on intel platforms where the gpu can only access
> 4G, but the cpu has way more.

Right. And that works for architectures without weird DMA offsets and
devices that exactly have a 32-bit DMA limit. It falls flat for all
the more complex ones unfortunately.

> > But userspace malloc really means dma_map_* anyway, so not really
> > relevant for memory allocations.
>
> It does tie in, since we'll want a dma_map which fails if a direct mapping
> isn't possible. It also helps the driver code a lot if we could use the
> same low-level flushing functions between our own memory (whatever that
> is) and anon pages from malloc. And in all the cases if it's not possible,
> we want a failure, not elaborate attempts at hiding the differences
> between all possible architectures out there.

At the very lowest level all goes down to the same three primitives we
talked about anyway, but there are different ways how they are combined.
For the streaming mappins looks at the table in arch/arc/mm/dma.c I
mentioned earlier. For memory that is prepared for just mmaping to
userspace without a kernel user we'll always do a wb+inv. But as the
other subthread shows we'll need to eventually look into unmapping
(or remapping with the same attributes) of that memory in kernel space
to avoid speculation bugs (or just invalid combination on x86 where
we check for that), so the API will be a little more complex.

Btw, are all DRM drivers using vmf_insert_* to pre-populate the mapping
like the MSM case, or are some doing dynamic faulting from
vm_ops->fault?