Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

From: Daniel Vetter
Date: Sun Oct 04 2020 - 12:09:43 EST


On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote:
>
> > > That leaves the only interesting places as vb2_dc_get_userptr() and
> > > vb2_vmalloc_get_userptr() which both completely fail to follow the
> > > REQUIRED behavior in the function's comment about checking PTEs. It
> > > just DMA maps them. Badly broken.
> > >
> > > Guessing this hackery is for some embedded P2P DMA transfer?
> >
> > Yeah, see also the follow_pfn trickery in
> > videobuf_dma_contig_user_get(), I think this is fully intentional and
> > userspace abi we can't break :-/
>
> We don't need to break uABI, it just needs to work properly in the
> kernel:
>
> vma = find_vma_intersection()
> dma_buf = dma_buf_get_from_vma(vma)
> sg = dma_buf_p2p_dma_map(dma_buf)
> [.. do dma ..]
> dma_buf_unmap(sg)
> dma_buf_put(dma_buf)
>
> It is as we discussed before, dma buf needs to be discoverable from a
> VMA, at least for users doing this kind of stuff.

I'm not a big fan of magic behaviour like this, there's more to
dma-buf buffer sharing than just "how do I get at the backing
storage". Thus far we've done everything rather explicitly. Plus with
exynos and habanalabs converted there's only v4l left over, and that
has a proper dma-buf import path already.

> > Yup this should be done with dma_buf instead, and v4l has that. But
> > old uapi and all that. This is why I said we might need a new
> > VM_DYNAMIC_PFNMAP or so, to make follow_pfn not resolve this in the
> > case where the driver manages the underlying iomem range (or whatever
> > it is) dynamically and moves buffer objects around, like drm drivers
> > do. But I looked, and we've run out of vma->vm_flags :-(
>
> A VM flag doesn't help - we need to introduce some kind of lifetime,
> and that has to be derived from the VMA. It needs data not just a flag

I don't want to make it work, I just want to make it fail. Rough idea
I have in mind is to add a follow_pfn_longterm, for all callers which
aren't either synchronized through mmap_sem or an mmu_notifier. If
this really breaks anyone's use-case we can add a tainting kernel
option which re-enables this (we've done something similar for
phys_addr_t based buffer sharing in fbdev, entirely unfixable since
the other driver has to just blindly trust that what userspace passes
around is legit). This here isn't unfixable, but if v4l people want to
keep it without a big "security hole here" sticker, they should do the
work, not me :-)

> > The other problem is that I also have no real working clue about all
> > the VM_* flags and what they all mean, and whether drm drivers set the
> > right ones in all cases (they probably don't, but oh well).
> > Documentation for this stuff in headers is a bit thin at times.
>
> Yah, I don't really know either :\
>
> The comment above vm_normal_page() is a bit helpful. Don't know what
> VM_IO/VM_PFNMAP mean in their 3 combinations
>
> There are very few places that set VM_PFNMAP without VM_IO..

Best I could find is:
- mk68 seems to outright reject pagefaults on VM_IO vma
- some places set VM_IO together with VM_MIXEDMAP instead of
VM_PFNMAP. There's some comments that this makes cow possible, but I
guess that's for the old pfn remap vma (remap_file_pages, which is
removed now). But really no clue.

VM_IO | VM_MIXEDMAP kinda makes me wonder whether follow_pfn gets the
page refcounting all right or horribly wrong in some cases ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch