Re: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

From: Alex Williamson
Date: Thu May 11 2017 - 23:21:11 EST


On Fri, 12 May 2017 02:50:32 +0000
"Cheng, Collins" <Collins.Cheng@xxxxxxx> wrote:

> Hi Helgaas,
>
> Some AMD GPUs have hardware support for graphics SR-IOV.
> If the SR-IOV capable GPU is plugged into the SR-IOV incapable
> platform. It would cause a problem on PCI resource allocation in
> current Linux kernel.
>
> Therefore in order to allow the PF (Physical Function) device of
> SR-IOV capable GPU to work on the SR-IOV incapable platform,
> it is required to verify conditions for initializing BAR resources
> on AMD SR-IOV capable GPUs.
>
> If the device is an AMD graphics device and it supports
> SR-IOV it will require a large amount of resources.
> Before calling sriov_init() must ensure that the system
> BIOS also supports SR-IOV and that system BIOS has been
> able to allocate enough resources.
> If the VF BARs are zero then the system BIOS does not
> support SR-IOV or it could not allocate the resources
> and this platform will not support AMD graphics SR-IOV.
> Therefore do not call sriov_init().
> If the system BIOS does support SR-IOV then the VF BARs
> will be properly initialized to non-zero values.
>
> Below is the patch against to Kernel 4.8 & 4.9. Please review.
>
> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
> quirks.c are for specific devices and one or more device ID are defined
> for the specific devices. However my patch is for all AMD SR-IOV
> capable GPUs, that includes all existing and future AMD server GPUs.
> So it doesn't seem like a good fit to put the fix in quirks.c.


Why is an AMD graphics card unique here? Doesn't sriov_init() always
need to be able to deal with devices of any type where the BIOS hasn't
initialized the SR-IOV capability? Some SR-IOV devices can fit their
VFs within a minimum bridge aperture, most cannot. I don't understand
why the VF resource requirements being exceptionally large dictates
that they receive special handling. Thanks,

Alex

> Signed-off-by: Collins Cheng <collins.cheng@xxxxxxx>
> ---
> drivers/pci/iov.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 60 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index e30f05c..e4f1405 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
> msleep(100);
> }
>
> +/*
> + * pci_vf_bar_valid - check if VF BARs have resource allocated
> + * @dev: the PCI device
> + * @pos: register offset of SR-IOV capability in PCI config space
> + * Returns true any VF BAR has resource allocated, false
> + * if all VF BARs are empty.
> + */
> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos)
> +{
> + int i;
> + u32 bar_value;
> + u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> + PCI_BASE_ADDRESS_MEM_TYPE_64 |
> + PCI_BASE_ADDRESS_MEM_PREFETCH);
> +
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4, &bar_value);
> + if (bar_value & bar_size_mask)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> + * @dev: the PCI device
> + *
> + * Returns true if device is an AMD/ATI display adapter,
> + * otherwise return false.
> + */
> +
> +static bool is_amd_display_adapter(struct pci_dev *dev)
> +{
> + return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> + (dev->vendor == PCI_VENDOR_ID_ATI ||
> + dev->vendor == PCI_VENDOR_ID_AMD));
> +}
> +
> /**
> * pci_iov_init - initialize the IOV capability
> * @dev: the PCI device
> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
> return -ENODEV;
>
> pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> - if (pos)
> - return sriov_init(dev, pos);
> -
> + if (pos) {
> + /*
> + * If the device is an AMD graphics device and it supports
> + * SR-IOV it will require a large amount of resources.
> + * Before calling sriov_init() must ensure that the system
> + * BIOS also supports SR-IOV and that system BIOS has been
> + * able to allocate enough resources.
> + * If the VF BARs are zero then the system BIOS does not
> + * support SR-IOV or it could not allocate the resources
> + * and this platform will not support AMD graphics SR-IOV.
> + * Therefore do not call sriov_init().
> + * If the system BIOS does support SR-IOV then the VF BARs
> + * will be properly initialized to non-zero values.
> + */
> + if (is_amd_display_adapter(dev)) {
> + if (pci_vf_bar_valid(dev, pos))
> + return sriov_init(dev, pos);
> + } else {
> + return sriov_init(dev, pos);
> + }
> + }
> return -ENODEV;
> }
>