Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

From: Greg KH
Date: Wed Mar 02 2022 - 02:54:59 EST


On Tue, Mar 01, 2022 at 02:47:28PM -0800, Iouri Tarassov wrote:
> On 3/1/2022 2:23 PM, Wei Liu wrote:
> > On Tue, Mar 01, 2022 at 09:45:41PM +0100, Greg KH wrote:
> > > On Tue, Mar 01, 2022 at 11:45:49AM -0800, Iouri Tarassov wrote:
> > > > - Create skeleton and add basic functionality for the
> > > > hyper-v compute device driver (dxgkrnl).
> > > > > > - Register for PCI and VM bus driver notifications and
> > > > handle initialization of VM bus channels.
> > > > > > - Connect the dxgkrnl module to the drivers/hv/ Makefile and
> > Kconfig
> > > > > > - Create a MAINTAINERS entry
> > > > > > A VM bus channel is a communication interface between the
> > hyper-v guest
> > > > and the host. The are two type of VM bus channels, used in the driver:
> > > > - the global channel
> > > > - per virtual compute device channel
> > > > > > A PCI device is created for each virtual compute device,
> > projected
> > > > by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
> > > > id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
> > > > arrival of such devices. The PCI config space of the virtual compute
> > > > device has luid of the corresponding virtual compute device VM
> > > > bus channel. This is how the compute device adapter objects are
> > > > linked to VM bus channels.
> > > > > > VM bus interface version is exchanged by reading/writing the PCI
> > config
> > > > space of the virtual compute device.
> > > > > > The IO space is used to handle CPU accessible compute device
> > > > allocations. Hyper-v allocates IO space for the global VM bus channel.
> > > > > > Signed-off-by: Iouri Tarassov <iourit@xxxxxxxxxxxxxxxxxxx>
> > > > Please work with internal developers to get reviews from them first,
> > > before requiring the kernel community to point out basic issues. It
> > > will save you a lot of time and stop us from feeling like we are having
> > > our time wasted.
> > > > Some simple examples below that your coworkers should have caught:
> > > > > --- /dev/null
> > > > +++ b/drivers/hv/dxgkrnl/dxgkrnl.h
> > > > @@ -0,0 +1,119 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +
> > > > +/*
> > > > + * Copyright (c) 2019, Microsoft Corporation.
> > > > It is now 2022 :)
> > > > > +void init_ioctls(void);
> > > > That is a horrible global function name you just added to the
> > kernel's
> > > namespace for a single driver :(
> > > > > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1,
> > unsigned long p2);
> > > > +
> > > > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > > > +{
> > > > + *luid = *(struct winluid *)&guid->b[0];
> > > > Why is the cast needed? Shouldn't you use real types in your
> > > structures?
> > > > > +/*
> > > > + * The interface version is used to ensure that the host and the guest use the
> > > > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > > > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > > > + * incremented each time the earlier versions of the interface are no longer
> > > > + * compatible with the current version.
> > > > + */
> > > > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > > > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > > > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
> > > > Where do these numbers come from, the hypervisor specification?
> > > > > +/*
> > > > + * Pointer to the global device data. By design
> > > > + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> > > > + * is created.
> > > > + */
> > > > +struct dxgglobal *dxgglobal;
> > > > No, make this per-device, NEVER have a single device for your
> > driver.
> > > The Linux driver model makes it harder to do it this way than to do it
> > > correctly. Do it correctly please and have no global structures like
> > > this.
> > >
> >
> > This may not be as big an issue as you thought. The device discovery is
> > still done via the normal VMBus probing routine. For all intents and
> > purposes the dxgglobal structure can be broken down into per device
> > fields and a global structure which contains the protocol versioning
> > information -- my understanding is there will always be a global
> > structure to hold information related to the backend, regardless of how
> > many devices there are.
> >
> > I definitely think splitting is doable, but I also understand why Iouri
> > does not want to do it _now_ given there is no such a model for multiple
> > devices yet, so anything we put into the per-device structure could be
> > incomplete and it requires further changing when such a model arrives
> > later.
> >
> > Iouri, please correct me if I have the wrong mental model here.
> >
> > All in all, I hope this is not going to be a deal breaker for the
> > acceptance of this driver.
> >
> > Thanks,
> > Wei.
>
> I agree with Wei that there always be global driver data.

Why?

> The driver reflects what the host offers and also it must provide the same
> interface to user mode as the host driver does. This is because we want the
> user mode clients to use the same device interface as if they are working on
> the host directly.

That's fine, put that state in the device interface.

> By design a single global VMBus channel is offered by the host and a single
> /dev/dxg device is created. The /dev/dxg device provides interface to enumerate
> virtual compute devices via an ioctl.

You have device data for each vmbus channel given to the driver, use
that.

> If we are to change this model, we would need to make changes to user mode
> clients, which is a big re-design change, affecting many hardware vendors.

This should not be visible to userspace at all. If so, you are really
doing something odd.

thanks,

greg k-h