Re: [PATCH v3 02/30] drivers: hv: dxgkrnl: Driver initialization and loading

From: Iouri Tarassov
Date: Wed Mar 02 2022 - 19:17:19 EST


Hi Greg,

Thanks a lot for reviewing the patches.

I appreciate the very detailed comments. I my reply I omitted the
comments, which I agree with and will address in subsequent patches.

On 3/1/2022 12:45 PM, Greg KH wrote:

>
> > +long dxgk_unlocked_ioctl(struct file *f, unsigned int p1, unsigned long p2);
> > +
> > +static inline void guid_to_luid(guid_t *guid, struct winluid *luid)
> > +{
> > + *luid = *(struct winluid *)&guid->b[0];
>
> Why is the cast needed? Shouldn't you use real types in your
> structures?


The VMBus channel ID, which is GUID, is present in the PCI config space
of the compute device. The convention is that the lower part of the GUID
is LUID (local unique identifier). This function just converts GUID to
LUID. I am not sure what is the ask here.


> > +/*
> > + * The interface version is used to ensure that the host and the guest use the
> > + * same VM bus protocol. It needs to be incremented every time the VM bus
> > + * interface changes. DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION is
> > + * incremented each time the earlier versions of the interface are no longer
> > + * compatible with the current version.
> > + */
> > +#define DXGK_VMBUS_INTERFACE_VERSION_OLD 27
> > +#define DXGK_VMBUS_INTERFACE_VERSION 40
> > +#define DXGK_VMBUS_LAST_COMPATIBLE_INTERFACE_VERSION 16
>
> Where do these numbers come from, the hypervisor specification?


The definitions are for the VMBus interface between the virtual compute
device in the guest and the Windows host. The interface version is
updated when the VMBus protocol is changed. These are specific to the
virtual compute device, not to the hypervisor.


> > +/*
> > + * Pointer to the global device data. By design
> > + * there is a single vGPU device on the VM bus and a single /dev/dxg device
> > + * is created.
> > + */
> > +struct dxgglobal *dxgglobal;
>
> No, make this per-device, NEVER have a single device for your driver.
> The Linux driver model makes it harder to do it this way than to do it
> correctly. Do it correctly please and have no global structures like
> this.


This will be addressed in the next version of patches.


> > +#define DXGKRNL_VERSION 0x2216
>
> What does this mean?


This is the driver implementation version. There are many Microsoft
shipped Linux kernels, which include the driver. The version allows to
quickly determine, which level of functionality or bug fixes is
implemented in the current driver.


> > +
> > +/*
> > + * Interface with the PCI driver
> > + */
> > +
> > +/*
> > + * Part of the PCI config space of the vGPU device is used for vGPU
> > + * configuration data. Reading/writing of the PCI config space is forwarded
> > + * to the host.
> > + */
>
> > +/* DXGK_VMBUS_INTERFACE_VERSION (u32) */
>
> What is this?


This comment explains that the value of DXGK_VMBUS_INTERFACE_VERSION is
located at the offset DXGK_VMBUS_CHANNEL_ID_OFFSET in the PCI config space.


> > +#define DXGK_VMBUS_VERSION_OFFSET (DXGK_VMBUS_CHANNEL_ID_OFFSET + \
> > + sizeof(guid_t))
>
> offsetof() is your friend, please use it.


There is no structure definition for the PCI config space of the device.
Therefore, offsets are computed by using data size.


> > +/* Luid of the virtual GPU on the host (struct winluid) */
>
> What is a "luid"?


LUID is "Locally Unique Identifier". It is used in Windows and its value
is guaranteed to be unique until the computer reboots. This is similar
to GUID, but is it not globally unique. Dxgkrnl on the host uses LUIDs
to identify compute adapter objects.
I added a comment to the definition in the header.


> > + ret = vmbus_driver_register(&dxg_drv);
> > + if (ret) {
> > + pr_err("vmbus_driver_register failed: %d", ret);
> > + return ret;
> > + }
> > + dxgglobal->vmbus_registered = true;
> > +
> > + pr_info("%s Version: %x", __func__, DXGKRNL_VERSION);
>
> When drivers work, they are quiet.
>
> Also, in-kernel modules do not have a version, they are part of the
> kernel tree, and that's the "version" they have. You can drop the
> individual version here, it makes no sense anymore.


Microsoft develops many Linux kernels when the dxgkrnl driver is
included (EFLOW, Android, WSL). The kernels are built at different times
and might include a different version of the driver. Printing the
version allows to quickly determine what functionality and bug fixes are
included in the driver.

I see that other in-tree drivers print some information during
init_module (like nfblock.c).

This is done for convenience. So instead of tracking multiple kernel
versions, we can track the dxgkrnl driver version. I am ok to remove it
if it really hurts something. It is only a single line per driver load.


Thanks

Iouri