Re: [PATCH v2] PCI: Explicitly put devices into D0 when initializing - Bug report

From: Alexey Kardashevskiy
Date: Thu Jun 19 2025 - 05:14:13 EST


On 12/6/25 06:45, Mario Limonciello wrote:
On 6/11/2025 9:13 AM, Cabiddu, Giovanni wrote:
On Wed, Jun 11, 2025 at 10:00:02AM -0600, Alex Williamson wrote:
On Wed, 11 Jun 2025 06:50:59 -0700
Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 6/11/2025 5:52 AM, Cabiddu, Giovanni wrote:
Hi Mario, Bjorn and Alex,

On Wed, Apr 23, 2025 at 11:31:32PM -0500, Mario Limonciello wrote:
From: Mario Limonciello <mario.limonciello@xxxxxxx>

AMD BIOS team has root caused an issue that NVME storage failed to come
back from suspend to a lack of a call to _REG when NVME device was probed.

commit 112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states")
added support for calling _REG when transitioning D-states, but this only
works if the device actually "transitions" D-states.

commit 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI
devices") added support for runtime PM on PCI devices, but never actually
'explicitly' sets the device to D0.

To make sure that devices are in D0 and that platform methods such as
_REG are called, explicitly set all devices into D0 during initialization.

Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
Through a bisect, we identified that this patch, in v6.16-rc1,
introduces a regression on vfio-pci across all Intel QuickAssist (QAT)
devices. Specifically, the ioctl VFIO_GROUP_GET_DEVICE_FD call fails
with -EACCES.

Upon further investigation, the -EACCES appears to originate from the
rpm_resume() function, which is called by pm_runtime_resume_and_get()
within vfio_pci_core_enable(). Here is the exact call trace:

      drivers/base/power/runtime.c: rpm_resume()
      drivers/base/power/runtime.c: __pm_runtime_resume()
      include/linux/pm_runtime.h: pm_runtime_resume_and_get()
      drivers/vfio/pci/vfio_pci_core.c: vfio_pci_core_enable()
      drivers/vfio/pci/vfio_pci.c: vfio_pci_open_device()
      drivers/vfio/vfio_main.c: device->ops->open_device()
      drivers/vfio/vfio_main.c: vfio_df_device_first_open()
      drivers/vfio/vfio_main.c: vfio_df_open()
      drivers/vfio/group.c: vfio_df_group_open()
      drivers/vfio/group.c: vfio_device_open_file()
      drivers/vfio/group.c: vfio_group_ioctl_get_device_fd()
      drivers/vfio/group.c: vfio_group_fops_unl_ioctl(..., VFIO_GROUP_GET_DEVICE_FD, ...)

Is this a known issue that affects other devices? Is there any ongoing
discussion or fix in progress?

Thanks,

This is the first I've heard about an issue with that patch.

Does setting the VFIO parameter disable_idle_d3 help?

If so; this feels like an imbalance of runtime PM calls in the VFIO
stack that this patch exposed.

Alex, any ideas?

Does the device in question have a PM capability?  I note that
4d4c10f763d7 makes the sequence:

        pm_runtime_forbid(&dev->dev);
        pm_runtime_set_active(&dev->dev);
        pm_runtime_enable(&dev->dev);

Dependent on the presence of a PM capability.  The PM capability is
optional on SR-IOV VFs.  This feels like a bug in the original patch,
we should be able to use pm_runtime ops on a device without
specifically checking if the device supports PCI PM.

vfio-pci also has a somewhat unique sequence versus other drivers, we
don't call pci_enable_device() until the user opens the device, but we
want to put the device into low power before that occurs.  Historically
PCI-core left device in an unknown power state between driver uses, so
we've needed to manually move the device to D0 before calling
pm_runtime_allow() and pm_runtime_put() (see
vfio_pci_core_register_device()).  Possibly this is redundant now but
we're using pci_set_power_state() which shouldn't interact with
pm_runtime, so my initial guess is that we might be unbalanced because
this is a VF w/o a PM capability and we've missed the expected
pm_runtime initialization sequence.  Thanks,

Yes, for Intel QAT, the issue occurs with a VF without the PM capability.

Thanks,


Got it, thanks Alex!  I think this should help return it to previous behavior for devices without runtime PM and still fix the problem it needed to.


Seems working for me too, thanks,




diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3dd44d1ad829..c495c3c692f5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3221,15 +3221,17 @@ void pci_pm_init(struct pci_dev *dev)

        /* find PCI PM capability in list */
        pm = pci_find_capability(dev, PCI_CAP_ID_PM);
-       if (!pm)
+       if (!pm) {
+               goto poweron;
                return;
+       }
        /* Check device's ability to generate PME# */
        pci_read_config_word(dev, pm + PCI_PM_PMC, &pmc);

        if ((pmc & PCI_PM_CAP_VER_MASK) > 3) {
                pci_err(dev, "unsupported PM cap regs version (%u)\n",
                        pmc & PCI_PM_CAP_VER_MASK);
-               return;
+               goto poweron;
        }

        dev->pm_cap = pm;
@@ -3274,6 +3276,7 @@ void pci_pm_init(struct pci_dev *dev)
        pci_read_config_word(dev, PCI_STATUS, &status);
        if (status & PCI_STATUS_IMM_READY)
                dev->imm_ready = 1;
+poweron:
        pci_pm_power_up_and_verify_state(dev);
        pm_runtime_forbid(&dev->dev);
        pm_runtime_set_active(&dev->dev);

--
Alexey