Re: [PATCH RFC 09/10] vfio/type1: Use follow_pfn for VM_FPNMAP VMAs

From: Joao Martins
Date: Tue Feb 11 2020 - 11:24:39 EST


On 2/7/20 9:08 PM, Jason Gunthorpe wrote:
> On Fri, Jan 10, 2020 at 07:03:12PM +0000, Joao Martins wrote:
>> From: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
>>
>> Unconditionally interpreting vm_pgoff as a PFN is incorrect.
>>
>> VMAs created by /dev/mem do this, but in general VM_PFNMAP just means
>> that the VMA doesn't have an associated struct page and is being managed
>> directly by something other than the core mmu.
>>
>> Use follow_pfn like KVM does to find the PFN.
>>
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx>
>> drivers/vfio/vfio_iommu_type1.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 2ada8e6cdb88..1e43581f95ea 100644
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -362,9 +362,9 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>> vma = find_vma_intersection(mm, vaddr, vaddr + 1);
>>
>> if (vma && vma->vm_flags & VM_PFNMAP) {
>> - *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> - if (is_invalid_reserved_pfn(*pfn))
>> - ret = 0;
>> + ret = follow_pfn(vma, vaddr, pfn);
>> + if (!ret && !is_invalid_reserved_pfn(*pfn))
>> + ret = -EOPNOTSUPP;
>> }
>
> FWIW this existing code is a huge hack and a security problem.
>
> I'm not sure how you could be successfully using this path on actual
> memory without hitting bad bugs?
>
ATM I think this codepath is largelly hit at the moment for MMIO (GPU
passthrough, or mdev). In the context of this patch, guest memory would be
treated similarly meaning the device-dax backing memory wouldn't have a 'struct
page' (as introduced in this series).

> Fudamentally VFIO can't retain a reference to a page from within a VMA
> without some kind of recount/locking/etc to allow the thing that put
> the page there to know it is still being used (ie programmed in a
> IOMMU) by VFIO.
>
> Otherwise it creates use-after-free style security problems on the
> page.
>
I take it you're referring to the past problems with long term page pinning +
fsdax? Or you had something else in mind, perhaps related to your LSFMM topic?

Here the memory can't be used by the kernel (and there's no struct page) except
from device-dax managing/tearing/driving the pfn region (which is static and the
underlying PFNs won't change throughout device lifetime), and vfio
pinning/unpinning the pfns (which are refcounted against multiple map/unmaps);

> This code needs to be deleted, not extended :(

To some extent it isn't really an extension: the patch was just removing the
assumption @vm_pgoff being the 'start pfn' on PFNMAP vmas. This is also
similarly done by get_vaddr_frames().

Joao