Re: [RFC 03/20] vfio: Add vfio_[un]register_device()

From: david@xxxxxxxxxxxxxxxxxxxxx
Date: Wed Sep 29 2021 - 23:02:51 EST


On Wed, Sep 29, 2021 at 09:22:30AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 29, 2021 at 12:46:14PM +1000, david@xxxxxxxxxxxxxxxxxxxxx wrote:
> > On Tue, Sep 21, 2021 at 10:00:14PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Sep 22, 2021 at 12:54:02AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > > Sent: Wednesday, September 22, 2021 12:01 AM
> > > > >
> > > > > > One open about how to organize the device nodes under
> > > > > /dev/vfio/devices/.
> > > > > > This RFC adopts a simple policy by keeping a flat layout with mixed
> > > > > devname
> > > > > > from all kinds of devices. The prerequisite of this model is that devnames
> > > > > > from different bus types are unique formats:
> > > > >
> > > > > This isn't reliable, the devname should just be vfio0, vfio1, etc
> > > > >
> > > > > The userspace can learn the correct major/minor by inspecting the
> > > > > sysfs.
> > > > >
> > > > > This whole concept should disappear into the prior patch that adds the
> > > > > struct device in the first place, and I think most of the code here
> > > > > can be deleted once the struct device is used properly.
> > > > >
> > > >
> > > > Can you help elaborate above flow? This is one area where we need
> > > > more guidance.
> > > >
> > > > When Qemu accepts an option "-device vfio-pci,host=DDDD:BB:DD.F",
> > > > how does Qemu identify which vifo0/1/... is associated with the specified
> > > > DDDD:BB:DD.F?
> > >
> > > When done properly in the kernel the file:
> > >
> > > /sys/bus/pci/devices/DDDD:BB:DD.F/vfio/vfioX/dev
> > >
> > > Will contain the major:minor of the VFIO device.
> > >
> > > Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> > > that the major:minor matches.
> > >
> > > in the above pattern "pci" and "DDDD:BB:DD.FF" are the arguments passed
> > > to qemu.
> >
> > I thought part of the appeal of the device centric model was less
> > grovelling around in sysfs for information. Using type/address
> > directly in /dev seems simpler than having to dig around matching
> > things here.
>
> I would say more regular grovelling. Starting from a sysfs device
> directory and querying the VFIO cdev associated with it is much more
> normal than what happens today, which also includes passing sysfs
> information into an ioctl :\

Hm.. ok. Clearly I'm unfamiliar with the things that do that. Other
than current VFIO, the only model I've really seen is where you just
point your program at a device node.

> > Note that this doesn't have to be done in kernel: you could have the
> > kernel just call them /dev/vfio/devices/vfio0, ... but add udev rules
> > that create symlinks from say /dev/vfio/pci/DDDD:BB:SS.F - >
> > ../devices/vfioXX based on the sysfs information.
>
> This is the right approach if people want to do this, but I'm not sure
> it is worth it given backwards compat requires the sysfs path as
> input.

You mean for userspace that needs to be able to go back to the old
VFIO interface as well? It seems silly to force this sysfs mucking
about on new programs that depend on the new interface.

> We may as well stick with sysfs as the command line interface
> for userspace tools.

> And I certainly don't want to see userspace tools trying to reverse a
> sysfs path into a /dev/ symlink name when they can directly and
> reliably learn the correct cdev from the sysfspath.

Um.. sure.. but they can get the correct cdev from the sysfspath no
matter how we name the cdevs.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature