Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup

From: Alex Williamson
Date: Thu Mar 04 2021 - 16:40:21 EST


On Thu, 25 Feb 2021 19:49:49 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote:
>
> > This is where it gets tricky. The vm_pgoff we get from
> > file_operations.mmap is already essentially describing an offset from
> > the base of a specific resource. We could convert that from an absolute
> > offset to a pfn offset, but it's only the bus driver code (ex.
> > vfio-pci) that knows how to get the base, assuming there is a single
> > base per region (we can't assume enough bits per region to store
> > absolute pfn). Also note that you're suggesting that all vfio mmaps
> > would need to standardize on the vfio-pci implementation of region
> > layouts. Not that most drivers haven't copied vfio-pci, but we've
> > specifically avoided exposing it as a fixed uAPI such that we could have
> > the flexibility for a bus driver to implement regions offsets however
> > they need.
>
> Okay, well the bus driver owns the address space and the bus driver is
> in control of the vm_pgoff. If it doesn't want to zap then it doesn't
> need to do anything
>
> vfio-pci can consistently use the index encoding and be fine
>
> > So I'm not really sure what this looks like. Within vfio-pci we could
> > keep the index bits in place to allow unmmap_mapping_range() to
> > selectively zap matching vm_pgoffs but expanding that to a vfio
> > standard such that the IOMMU backend can also extract a pfn looks very
> > limiting, or ugly. Thanks,
>
> Lets add a op to convert a vma into a PFN range. The map code will
> pass the vma to the op and get back a pfn (or failure).
>
> pci will switch the vm_pgoff to an index, find the bar base and
> compute the pfn.
>
> It is starting to look more and more like dma buf though

How terrible would it be if vfio-core used a shared vm_private_data
structure and inserted itself into the vm_ops call chain to reference
count that structure? We already wrap the device file descriptor via
vfio_device_fops.mmap, so we could:

struct vfio_vma_private_data *priv;

priv = kzalloc(...

priv->device = device;
vma->vm_private_data = priv;

device->ops->mmap(device->device_data, vma);

if (vma->vm_private_data == priv) {
priv->vm_ops = vma->vm_ops;
vma->vm_ops = &vfio_vm_ops;
} else
kfree(priv);

Where:

struct vfio_vma_private_data {
struct vfio_device *device;
unsigned long pfn_base;
void *private_data; // maybe not needed
const struct vm_operations_struct *vm_ops;
struct kref kref;
unsigned int release_notification:1;
};

Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference. The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly. vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.

I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative. Thanks,

Alex