Re: [PATCH v2] PCI: Explicitly put devices into D0 when initializing - Bug report

From: Mario Limonciello
Date: Wed Jun 11 2025 - 09:51:15 EST


On 6/11/2025 5:52 AM, Cabiddu, Giovanni wrote:
Hi Mario, Bjorn and Alex,

On Wed, Apr 23, 2025 at 11:31:32PM -0500, Mario Limonciello wrote:
From: Mario Limonciello <mario.limonciello@xxxxxxx>

AMD BIOS team has root caused an issue that NVME storage failed to come
back from suspend to a lack of a call to _REG when NVME device was probed.

commit 112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states")
added support for calling _REG when transitioning D-states, but this only
works if the device actually "transitions" D-states.

commit 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI
devices") added support for runtime PM on PCI devices, but never actually
'explicitly' sets the device to D0.

To make sure that devices are in D0 and that platform methods such as
_REG are called, explicitly set all devices into D0 during initialization.

Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
Through a bisect, we identified that this patch, in v6.16-rc1,
introduces a regression on vfio-pci across all Intel QuickAssist (QAT)
devices. Specifically, the ioctl VFIO_GROUP_GET_DEVICE_FD call fails
with -EACCES.

Upon further investigation, the -EACCES appears to originate from the
rpm_resume() function, which is called by pm_runtime_resume_and_get()
within vfio_pci_core_enable(). Here is the exact call trace:

drivers/base/power/runtime.c: rpm_resume()
drivers/base/power/runtime.c: __pm_runtime_resume()
include/linux/pm_runtime.h: pm_runtime_resume_and_get()
drivers/vfio/pci/vfio_pci_core.c: vfio_pci_core_enable()
drivers/vfio/pci/vfio_pci.c: vfio_pci_open_device()
drivers/vfio/vfio_main.c: device->ops->open_device()
drivers/vfio/vfio_main.c: vfio_df_device_first_open()
drivers/vfio/vfio_main.c: vfio_df_open()
drivers/vfio/group.c: vfio_df_group_open()
drivers/vfio/group.c: vfio_device_open_file()
drivers/vfio/group.c: vfio_group_ioctl_get_device_fd()
drivers/vfio/group.c: vfio_group_fops_unl_ioctl(..., VFIO_GROUP_GET_DEVICE_FD, ...)

Is this a known issue that affects other devices? Is there any ongoing
discussion or fix in progress?

Thanks,


This is the first I've heard about an issue with that patch.

Does setting the VFIO parameter disable_idle_d3 help?

If so; this feels like an imbalance of runtime PM calls in the VFIO stack that this patch exposed.

Alex, any ideas?