Re: [PATCH RFC 09/10] vfio/type1: Use follow_pfn for VM_FPNMAP VMAs

From: Jason Gunthorpe
Date: Tue Feb 11 2020 - 11:50:41 EST


On Tue, Feb 11, 2020 at 04:23:49PM +0000, Joao Martins wrote:
> On 2/7/20 9:08 PM, Jason Gunthorpe wrote:
> > On Fri, Jan 10, 2020 at 07:03:12PM +0000, Joao Martins wrote:
> >> From: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
> >>
> >> Unconditionally interpreting vm_pgoff as a PFN is incorrect.
> >>
> >> VMAs created by /dev/mem do this, but in general VM_PFNMAP just means
> >> that the VMA doesn't have an associated struct page and is being managed
> >> directly by something other than the core mmu.
> >>
> >> Use follow_pfn like KVM does to find the PFN.
> >>
> >> Signed-off-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
> >> drivers/vfio/vfio_iommu_type1.c | 6 +++---
> >> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 2ada8e6cdb88..1e43581f95ea 100644
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -362,9 +362,9 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> >> vma = find_vma_intersection(mm, vaddr, vaddr + 1);
> >>
> >> if (vma && vma->vm_flags & VM_PFNMAP) {
> >> - *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> >> - if (is_invalid_reserved_pfn(*pfn))
> >> - ret = 0;
> >> + ret = follow_pfn(vma, vaddr, pfn);
> >> + if (!ret && !is_invalid_reserved_pfn(*pfn))
> >> + ret = -EOPNOTSUPP;
> >> }
> >
> > FWIW this existing code is a huge hack and a security problem.
> >
> > I'm not sure how you could be successfully using this path on actual
> > memory without hitting bad bugs?
> >
> ATM I think this codepath is largelly hit at the moment for MMIO (GPU
> passthrough, or mdev). In the context of this patch, guest memory would be
> treated similarly meaning the device-dax backing memory wouldn't have a 'struct
> page' (as introduced in this series).

I think it is being used specifically to allow two VFIO's to be
inserted into a VM and have the IOMMU setup to allow MMIO access.

> > Fudamentally VFIO can't retain a reference to a page from within a VMA
> > without some kind of recount/locking/etc to allow the thing that put
> > the page there to know it is still being used (ie programmed in a
> > IOMMU) by VFIO.
> >
> > Otherwise it creates use-after-free style security problems on the
> > page.
>
> I take it you're referring to the past problems with long term page pinning +
> fsdax? Or you had something else in mind, perhaps related to your LSFMM topic?

No. I'm refering to retaining access to memory backed a VMA without
holding any kind of locking on it. This is an access after free scenario.

It *should* be like a long term page pin so that the VMA owner knows
something is happening.

> Here the memory can't be used by the kernel (and there's no struct page) except
> from device-dax managing/tearing/driving the pfn region (which is static and the
> underlying PFNs won't change throughout device lifetime), and vfio
> pinning/unpinning the pfns (which are refcounted against multiple map/unmaps);

For instance if you tear down the device-dax then VFIO will happily
continue to reference the memory. This is a bug.

There are other cases that escalate to security bugs.

> > This code needs to be deleted, not extended :(
>
> To some extent it isn't really an extension: the patch was just removing the
> assumption @vm_pgoff being the 'start pfn' on PFNMAP vmas. This is also
> similarly done by get_vaddr_frames().

You are extending it in the sense that you plan to use it for more
cases than VMAs created by some other VFIO. That should not be
done as it will only complicate fixing this code.

KVM is allowed to use follow_pfn because it uses MMU notifiers and
does not allow the result of follow_pfn to outlive the VMA (AFAIK at
least). So it should be safe.

Jason