On Thu, 14 Aug 2025 09:33:47 -0700Yes, exactly.
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
On 8/14/2025 6:12 AM, Niklas Schnelle wrote:It's too late because we've re-written the error value back to config
On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:Yeah I did think of adding something like s390x CLP reset as part of the
On Wed, 13 Aug 2025 14:52:24 -0700--- snip ---
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
On 8/13/2025 1:30 PM, Alex Williamson wrote:
On Wed, 13 Aug 2025 10:08:19 -0700
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
For zPCI devices we should drive a platform specific function reset
as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
in error state.
Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
---
arch/s390/pci/pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 4 ++++
drivers/vfio/pci/vfio_pci_priv.h | 5 ++++
drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
4 files changed, 49 insertions(+)
I agree with you Alex. Still trying to figure out what's needed forWe generally rely on the abstraction of pci_reset_function() to selectAre you suggesting to move this to an arch specific function? One thing+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)This looks like it should be a device or arch specific reset
+{
+ struct zpci_dev *zdev = to_zpci(vdev->pdev);
+ int rc = -EIO;
+
+ if (!zdev)
+ return -ENODEV;
+
+ /*
+ * If we can't get the zdev->state_lock the device state is
+ * currently undergoing a transition and we bail out - just
+ * the same as if the device's state is not configured at all.
+ */
+ if (!mutex_trylock(&zdev->state_lock))
+ return rc;
+
+ /* We can reset only if the function is configured */
+ if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
+ goto out;
+
+ rc = zpci_hot_reset_device(zdev);
+ if (rc != 0)
+ goto out;
+
+ if (!vdev->pci_saved_state) {
+ pci_err(vdev->pdev, "No saved available for the device");
+ rc = -EIO;
+ goto out;
+ }
+
+ pci_dev_lock(vdev->pdev);
+ pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
+ pci_restore_state(vdev->pdev);
+ pci_dev_unlock(vdev->pdev);
+out:
+ mutex_unlock(&zdev->state_lock);
+ return rc;
+}
implemented in drivers/pci, not vfio. Thanks,
Alex
we need to do after the zpci_hot_reset_device, is to correctly restore
the config space of the device. And for vfio-pci bound devices we want
to restore the state of the device to when it was initially opened.
the correct type of reset for a function scope reset. We've gone to
quite a bit of effort to implement all device specific resets and
quirks in the PCI core to be re-used across the kernel.
Calling zpci_hot_reset_device() directly seems contradictory to those
efforts. Should pci_reset_function() call this universally on s390x
rather than providing access to FLR/PM/SBR reset?
this. We already do zpci_hot_reset_device() in reset_slot() from the
s390_pci_hpc.c hotplug slot driver and that does get called via
pci_reset_hotplug_slot() and pci_reset_function(). There are a few
problems though that meant it didn't work for Farhan but I agree maybe
we can fix them for the general case. For one pci_reset_function()
via DEVICE_RESET first tries FLR but that won't work with the device in
the error state and MMIO blocked. Sadly __pci_reset_function_locked()
then concludes that other resets also won't work. So that's something
we might want to improve in general, for example maybe we need
something more like pci_dev_acpi_reset() with higher priority than FLR.
reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But
that would introduce s390x specific code in pci core common code.
Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm notVFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for
sure why that won't work as is. @Farhan do you know?
majority of PCI devices on s390x as that would drive a bus reset. It was
sufficient as most devices were single bus devices. But in the latest
generation of machines (z17) we expose true SR-IOV and an OS can have
access to both PF and VFs and so these are on the same bus and can have
different ownership based on what is bound to vfio-pci.
My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per
function reset mechanism, which maps well with what our architecture
provides. On s390x we can drive a per function reset (via firmware)
through the CLP instruction driven by the zpci_hot_reset_device(). And
doing it as vfio zpci specific function would confine the s390x logic.
The reason I abandoned reset_done() callback idea is because its notWhy is itI think an alternative to that, which Farhan actually had in the
universally correct here given the ioctl previously made use of
standard reset mechanisms?
The DEVICE_RESET ioctl is simply an in-place reset of the device,
without restoring the original device state. So we're also subtly
changing that behavior here, presumably because we're targeting the
specific error recovery case. Have you considered how this might
break non-error-recovery use cases?
I wonder if we want a different reset mechanism for this use case
rather than these subtle semantic changes.
previous internal version, is to implement
pci_error_handlers::reset_done() and do the pci_load_saved_state()
there. That would only affect the error recovery case leaving other
cases alone.
Thanks,
Niklas
sufficient to recover the device correctly. Today before driving a reset
we save the state of the device. When a device is in error state, any
pci load/store (on s390x they are actual instructions :)) to config
space would return an error value (0xffffffff). We don't have any checks
in pci_save_state to prevent storing error values. And after a reset
when we try to restore the config space (pci_dev_restore) we try to
write the error value and this can be problematic. By the time the
reset_done() callback is invoked, its already too late.
space and as a result the device is broken?
What if
pci_restore_state() were a little smarter to detect that it has bad
read data from pci_save_state() and only restores state based on kernel
data? Would that leave the device in a functional state that
reset_done() could restore the original saved state and push it out to
the device?
Yeah I can do that. I think adding some validation checks to the FLR/PM callbacks wouldn't be a bad idea if its acceptable for PCI maintainers.@Alex,Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
I am open to ideas/suggestions on this. Do we think we need a separate
VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?
ioctl that doesn't have any flags, so it's not very extensible.
Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
return -ENOTTY if the device is in an error state and config space is
returning -1 such that we fall through to a slot reset that doesn't
care how broken the device is and you auto-magically get the zpci
function you want? Follow-up with pushing the original state in
reset_done()? Thanks,
Alex