Re: [PATCHv2 8/8] videobuf2: handle non-contiguous DMA allocations

From: Tomasz Figa
Date: Fri Jun 18 2021 - 00:44:42 EST


On Fri, Jun 18, 2021 at 1:25 PM Christoph Hellwig <hch@xxxxxx> wrote:
>
> On Fri, Jun 18, 2021 at 12:21:33PM +0900, Tomasz Figa wrote:
> > > On Thu, Jun 17, 2021 at 06:40:58PM +0900, Tomasz Figa wrote:
> > > > Sorry, I meant dma_alloc_attrs() and yes, it's indeed a misnomer. Our
> > > > use case basically has no need for the additional coherent mapping, so
> > > > creation of it can be skipped to save some vmalloc space. (Yes, it
> > > > probably only matters for 32-bit architectures.)
> > >
> > > Yes, that is the normal use case, and it is solved by using
> > > dma_alloc_noncoherent or dma_alloc_noncontigous without the vmap
> > > step.
> >
> > True, silly me. Probably not enough coffee at the time I was looking at it.
> >
> > With that, wouldn't it be possible to completely get rid of
> > dma_alloc_{coherent,attrs}() and use dma_alloc_noncontiguous() +
> > optional kernel and/or userspace mapping helper everywhere? (Possibly
> > renaming it to something as simple as dma_alloc().
>
> Well, dma_alloc_coherent users want a non-cached mapping. And while
> some architectures provide that using a vmap with "uncached" bits in the
> PTE to provide that, this:
>
> a) is not possibly everywhere
> b) even where possible is not always the best idea as it creates mappings
> with differnet cachability bets

I think this could be addressed by having a dma_vmap() helper that
does the right thing, whether it's vmap() or dma_common_pages_remap()
as appropriate. Or would be this still insufficient for some
architectures?

>
> And even without that dma_alloc_noncoherent causes less overhead than
> dma_alloc_noncontigious if you only need a single contiguous range.
>

Given that behind the scenes dma_alloc_noncontiguous() would also just
call __dma_alloc_pages() for devices that need contiguous pages, would
the overhead be basically the creation of a single-entry sgtable?

> So while I'm happy we have something useful for more complex drivers like
> v4l I think the simple dma_alloc_coherent API, including some of the less
> crazy flags for dma_alloc_attrs is the right thing to use for more than
> 90% of the use cases.

One thing to take into account here is that many drivers use the
existing "simple" way, just because there wasn't a viable alternative
to do something better. Agreed, though, that we shouldn't optimize for
the rare cases.

Best regards,
Tomasz