RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

From: Tian, Kevin
Date: Tue Sep 28 2021 - 23:40:22 EST


> From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>
> Sent: Wednesday, September 29, 2021 10:44 AM
>
> >
> > One open about how to organize the device nodes under
> /dev/vfio/devices/.
> > This RFC adopts a simple policy by keeping a flat layout with mixed
> devname
> > from all kinds of devices. The prerequisite of this model is that devnames
> > from different bus types are unique formats:
> >
> > /dev/vfio/devices/0000:00:14.2 (pci)
> > /dev/vfio/devices/PNP0103:00 (platform)
> > /dev/vfio/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 (mdev)
>
> Oof. I really don't think this is a good idea. Ensuring that a
> format is "unique" in the sense that it can't collide with any of the
> other formats, for *every* value of the parameters on both sides is
> actually pretty complicated in general.
>
> I think per-type sub-directories would be helpful here, Jason's
> suggestion of just sequential numbers would work as well.

we'll follow Jason's suggestion in next version.

> > + /*
> > + * Refcounting can't start until the driver call register. Don’t
> > + * start twice when the device is exposed in both group and
> nongroup
> > + * interfaces.
> > + */
> > + if (!refcount_read(&device->refcount))
>
> Is there a possible race here with something getting in and
> incrementing the refcount between the read and set?

this will not be required in next version, which will always create
both group and nongroup interfaces for every device (then let
driver providing .bind_iommufd() callback for whether nongroup
interface is functional). It will be centrally processed within
existing vfio_[un]register_group_dev(), thus above race is not
a concern any more.

>
> > + refcount_set(&device->refcount, 1);
> >
> > mutex_lock(&group->device_lock);
> > list_add(&device->group_next, &group->device_list);
> > @@ -804,7 +810,78 @@ int vfio_register_group_dev(struct vfio_device
> *device)
> >
> > return 0;
> > }
> > -EXPORT_SYMBOL_GPL(vfio_register_group_dev);
> > +
> > +static int __vfio_register_nongroup_dev(struct vfio_device *device)
> > +{
> > + struct vfio_device *existing_device;
> > + struct device *dev;
> > + int ret = 0, minor;
> > +
> > + mutex_lock(&vfio.device_lock);
> > + list_for_each_entry(existing_device, &vfio.device_list, vfio_next) {
> > + if (existing_device == device) {
> > + ret = -EBUSY;
> > + goto out_unlock;
>
> This indicates a bug in the caller, doesn't it? Should it be a BUG or
> WARN instead?

this call is initiated by userspace. Per Jason's suggestion we don't
even need to check it then no lock is required.

Thanks
Kevin