Hi Mario, Bjorn and Alex,
On Wed, Apr 23, 2025 at 11:31:32PM -0500, Mario Limonciello wrote:
From: Mario Limonciello <mario.limonciello@xxxxxxx>Through a bisect, we identified that this patch, in v6.16-rc1,
AMD BIOS team has root caused an issue that NVME storage failed to come
back from suspend to a lack of a call to _REG when NVME device was probed.
commit 112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states")
added support for calling _REG when transitioning D-states, but this only
works if the device actually "transitions" D-states.
commit 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI
devices") added support for runtime PM on PCI devices, but never actually
'explicitly' sets the device to D0.
To make sure that devices are in D0 and that platform methods such as
_REG are called, explicitly set all devices into D0 during initialization.
Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
introduces a regression on vfio-pci across all Intel QuickAssist (QAT)
devices. Specifically, the ioctl VFIO_GROUP_GET_DEVICE_FD call fails
with -EACCES.
Upon further investigation, the -EACCES appears to originate from the
rpm_resume() function, which is called by pm_runtime_resume_and_get()
within vfio_pci_core_enable(). Here is the exact call trace:
drivers/base/power/runtime.c: rpm_resume()
drivers/base/power/runtime.c: __pm_runtime_resume()
include/linux/pm_runtime.h: pm_runtime_resume_and_get()
drivers/vfio/pci/vfio_pci_core.c: vfio_pci_core_enable()
drivers/vfio/pci/vfio_pci.c: vfio_pci_open_device()
drivers/vfio/vfio_main.c: device->ops->open_device()
drivers/vfio/vfio_main.c: vfio_df_device_first_open()
drivers/vfio/vfio_main.c: vfio_df_open()
drivers/vfio/group.c: vfio_df_group_open()
drivers/vfio/group.c: vfio_device_open_file()
drivers/vfio/group.c: vfio_group_ioctl_get_device_fd()
drivers/vfio/group.c: vfio_group_fops_unl_ioctl(..., VFIO_GROUP_GET_DEVICE_FD, ...)
Is this a known issue that affects other devices? Is there any ongoing
discussion or fix in progress?
Thanks,