Re: [PATCH 2/2] PCI: Create device link for NVIDIA GPU

From: Abhishek Sahu
Date: Mon Jun 03 2019 - 04:04:43 EST


Thanks Bjorn for your review.

On 6/1/2019 2:09 AM, Bjorn Helgaas wrote:
> [+cc Lukas, author of 07f4f97d7b4b ("vga_switcheroo: Use device link
> for HDA controller")]
>
> On Fri, May 31, 2019 at 10:31:09AM +0530, Abhishek Sahu wrote:
>> NVIDIA Turing GPUs include hardware support for USB Type-C and
>> VirtualLink. It helps in delivering the power, display, and data
>> required to power VR headsets through a single USB Type-C connector.
>> The Turing GPU is a multi-function PCI device has the following
>> four functions:
>>
>> - VGA display controller (Function 0)
>> - Audio controller (Function 1)
>> - USB xHCI Host controller (Function 2)
>> - USB Type-C USCI controller (Function 3)
>>
>> The function 0 is tightly coupled with other functions in the
>> hardware. When function 0 goes in runtime suspended state,
>
> "Runtime suspended" is a Linux concept, not a PCI concept. Please
> replace this with the appropriate PCI term, e.g., "D3hot" or whatever
> it is.

Sure. I will change this.

>
>> then it will do power gating for most of the hardware blocks.
>> Some of these hardware blocks are used by other functions which
>> leads to functional failure. So if any of these functions (1/2/3)
>> are active, then function 0 should also be in active state.
>
> Instead of "active" and "active state", please use the specific states
> required in terms of PCI.

Sure. I will use specific states name.

>
>> 'commit 07f4f97d7b4b ("vga_switcheroo: Use device link for
>> HDA controller")' creates the device link from function 1 to
>> function 0. A similar kind of device link needs to be created
>> between function 0 and functions 2 and 3 for NVIDIA Turing GPU.
>
> I can't point to language that addresses this, but this sounds like a
> case of the GPU not conforming to the PCI spec. The general
> assumption is that the OS should be able to discover everything it
> needs to do power management directly from the architected PCI config
> space.

The GPU is following PCIe spec but following is the implementation
from HW side

Normal GPU has VGA and Audio controller in which Audio is dependent
upon VGA. For these GPU, the VGA is managed by GPU driver and Audio is
managed by sound driver. Now the VGA driver can go to D3hot while
Audio still in D0 if there was no device link (added in commit
07f4f97d7b4b). It would lead to Audio functionality failure. The
device link is making sure that GPU is in D0 while Audio is D0.

Now the NVIDIA Turing GPU has one USB Type-C port mainly to support
virtual reality headset. With default mode, this USB port will act as
normal USB port and any USB device can be connected over it. It will
be managed with PCI xHCI USB controller driver. Now, to support VR
headset, This USB Type-C alternate mode is going to be used. This
alternate mode setting will be managed with [1] and [2] which is part
of 5.2-rc1. More detail for this is available in [3] and virtual link
open industry standard [4]. These VR frame-buffers are internally
going to be rendered by GPU only and managed by function 0 (VGA)
driver. Now, from HW side, we need to make sure VGA is in D0 while any
of other functions are in D0.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/typec/altmodes/nvidia.c
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-nvidia-gpu.c
[3]
https://fullcirclemagazine.org/2019/04/30/nvidia-creates-free-virtual-link-driver-for-linux/
[4] https://sites.google.com/view/virtuallink-consortium/home

>
> It is definitely not ideal to have to add quirks like this for devices
> designed this way. Such quirks force us to do otherwise unnecessary
> OS updates as new devices are released.

I can understand but this is the HW requirement. To support,
VR headset in GPU, the HW has provided USB Type-C. Now, from
SW side, we are working on supporting this with runtime PM.
Currently runtime PM is not possible without adding the
dependencies between different functions.

>
> If all the devices in a multi-function device were connected
> intimately enough that they all had to be managed by the same driver,
> I could imagine putting these non-discoverable dependencies in the
> driver. But these devices don't seem to be related in that way.
> > If there *is* spec language that allows dependencies like this, please
> include the citation in your commit log.
>

The PCIe specification treats each function separately but GPU case is
different. So, it won't be part of PCIe spec. in GPU, the different
kind of devices are internally coupled in HW but still needs to be
managed by different driver.

>> This patch does the same and create the required device links. It
>> will make function 0 to be runtime PM active if other functions
>> are still active.
>>
>> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
>> ---
>> drivers/pci/quirks.c | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index a20f7771a323..afdbc199efc5 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -4967,6 +4967,29 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID,
>> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>> PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
>>
>> +/* Create device link for NVIDIA GPU with integrated USB controller to VGA. */
>> +static void quirk_gpu_usb(struct pci_dev *usb)
>> +{
>> + pci_create_device_link_with_vga(usb, 2);
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>> + PCI_CLASS_SERIAL_USB, 8, quirk_gpu_usb);
>> +
>> +/*
>> + * Create device link for NVIDIA GPU with integrated Type-C UCSI controller
>> + * to VGA. Currently there is no class code defined for UCSI device over PCI
>> + * so using UNKNOWN class for now and it will be updated when UCSI
>> + * over PCI gets a class code.
>
> Ugh. Here's a good example of having to do yet another OS update.
>

Correct. But currently we don't have any other way.
Same thing we did for
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-nvidia-gpu.c#n248

Once it gets a class code then we can replace at all the places.

Regards,
Abhishek

>> + */
>> +#define PCI_CLASS_SERIAL_UNKNOWN 0x0c80
>> +static void quirk_gpu_usb_typec_ucsi(struct pci_dev *ucsi)
>> +{
>> + pci_create_device_link_with_vga(ucsi, 3);
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>> + PCI_CLASS_SERIAL_UNKNOWN, 8,
>> + quirk_gpu_usb_typec_ucsi);
>> +
>> /*
>> * Some IDT switches incorrectly flag an ACS Source Validation error on
>> * completions for config read requests even though PCIe r4.0, sec
>> --
>> 2.17.1
>>