[ISSUE] Cannot enable VF after remove/rescan

From: Radoslaw Tyl
Date: Wed Nov 30 2022 - 07:00:28 EST


Hi Yicong,

VF offset depends on set of ARIHeirarchy(+/-) in the PCI config.
After Power cycle this bit is set to (+). When we force the port removal

# echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove

and port rescan

# echo 1 > /sys/bus/pci/rescan

this cause that the ARIHierarchy is set to (-) and the offet is set to 384.
Look into the "Intel® 82599 10 GbE Controller Datasheet",
chapter 7.10.2.6.1.1 ARI Mode. In mode non-ARI "1" is added to the bus and
that cause you've got an error.

During boot sequence when all physical function (PF) are initialized,
pci driver in first attempt set ARI on the first encountered PF and ignore
other. When we remove that first encountered PF which has ARI enabled
in initializaion stage, performing rescan causes that the pci driver only
take into account existing earlier PF. In result pci driver doesn't set ARI
on any of them and the offset is set to 384.


static int sriov_init(struct pci_dev *dev, int pos)
{
ctrl = 0;
list_for_each_entry(pdev, &dev->bus->devices, bus_list)
if (pdev->is_physfn) <------
goto found;
pdev = NULL;
if (pci_ari_enabled(dev->bus))
ctrl |= PCI_SRIOV_CTRL_ARI;

found:

...


It is not the problem related to the device driver, but with pci driver,
as a workaround you may use one of these options:

1. remove all PF belonging to bus, before attempt to rescan.
2. disable ARI in grub add "noari" to the kernel parameters.
# noari do not use PCIe ARI.


--
BR,
Radek