[+cc Hui]And this quirk doesn't work in my case, in my case it is booting reset rather than FLR, so the "bool probe" is true and the pci_dev_wait() fails.
On Sat, Jul 12, 2025 at 07:30:28PM +0800, Weili Qian wrote:
Prior to commit d591f6804e7e ("PCI: Wait for device readiness withAs far as I can tell, there's nothing specific to Kungpeng devices
Configuration RRS"), pci_dev_wait() polls PCI_COMMAND register until
its value is not ~0(i.e., PCI_ERROR_RESPONSE). After d591f6804e7e,
if the Configuration Request Retry Status Software Visibility (RRS SV)
is enabled, pci_dev_wait() polls PCI_VENDOR_ID register until its value
is not the reserved Vendor ID value 0x0001.
On Kunpeng accelerator devices, RRS SV is enabled. However,
when the virtual function's FLR (Function Level Reset) is not
ready, the pci_dev_wait() reads the PCI_VENDOR_ID register and gets
the value 0xffff instead of 0x0001. It then incorrectly assumes this
is a valid Vendor ID and concludes the device is ready, returning
successfully. In reality, the function may not be fully ready, leading
to the device becoming unavailable.
A 100ms wait period is already implemented before calling pci_dev_wait().
In most cases, FLR completes within 100ms. However, to eliminate the
risk of function being unavailable due to an incomplete FLR, a
device-specific reset is added. After pcie_flr(), the function continues
to poll PCI_COMMAND register until its value is no longer ~0.
here. We've seen a similar issue with Intel NVMe devices [1], and I
don't want a whole mess of quirks and device-specific reset methods.
We need some sort of generic solution for this. My understanding was
that if devices are not ready 100ms after a reset, they are required
to respond with RRS. Maybe these devices are defective. Or maybe my
understanding is incorrect. Either way, I think we should at least
check for a PCIe error before assuming that 0xffff is a valid
response.
[1] https://lore.kernel.org/linux-pci/20250611101442.387378-1-hui.wang@xxxxxxxxxxxxx/
Fixes: d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS")
Signed-off-by: Weili Qian <qianweili@xxxxxxxxxx>
---
drivers/pci/quirks.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d7f4ee634263..1df1756257d2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4205,6 +4205,36 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
return 0;
}
+#define KUNPENG_OPERATION_WAIT_CNT 3000
+#define KUNPENG_RESET_WAIT_TIME 20
+
+/* Device-specific reset method for Kunpeng accelerator virtual functions */
+static int reset_kunpeng_acc_vf_dev(struct pci_dev *pdev, bool probe)
+{
+ u32 wait_cnt = 0;
+ u32 cmd;
+
+ if (probe)
+ return 0;
+
+ pcie_flr(pdev);
+
+ do {
+ pci_read_config_dword(pdev, PCI_COMMAND, &cmd);
+ if (!PCI_POSSIBLE_ERROR(cmd))
+ break;
+
+ if (++wait_cnt > KUNPENG_OPERATION_WAIT_CNT) {
+ pci_warn(pdev, "wait for FLR ready timeout; giving up\n");
+ return -ENOTTY;
+ }
+
+ msleep(KUNPENG_RESET_WAIT_TIME);
+ } while (true);
+
+ return 0;
+}
+
static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
reset_intel_82599_sfp_virtfn },
@@ -4220,6 +4250,12 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
reset_chelsio_generic_dev },
{ PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
reset_hinic_vf_dev },
+ { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_VF,
+ reset_kunpeng_acc_vf_dev },
+ { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_VF,
+ reset_kunpeng_acc_vf_dev },
+ { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_VF,
+ reset_kunpeng_acc_vf_dev },
{ 0 }
};
--
2.33.0