Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

From: Niklas Schnelle
Date: Tue Jul 12 2022 - 10:06:15 EST


On Tue, 2022-07-12 at 15:49 +0200, Hannes Reinecke wrote:
> On 7/12/22 14:44, Niklas Schnelle wrote:
> > On s390 and powerpc PCI devices are isolated when an error is detected
> > and driver->err_handler->error_detected is called with an inaccessible
> > PCI device and PCI channel state set to pci_channel_io_frozen
> > (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
> >
> > In the case of NVMe devices nvme_error_detected() then calls
> > nvme_dev_disable(dev, false) and requests a reset. After a successful
> > reset the device is accessible again and nvme_slot_reset() resets the
> > controller and queues nvme_reset_work() which then recovers the
> > controller.
> >
> > Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
> > nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
> > queues if pci_device_is_present() returns false. This is the case for an
> > isolated PCI device. In principle this makes sense as there are no
> > accessible hardware queues to run. The problem though is that for
> > a previously live reset controller with online queues nvme_reset_work()
> > calls nvme_wait_freeze() which, without the freeze having been
> > initiated, then hangs forever. Fix this by starting the freeze in
> > nvme_slot_reset() which is the earliest point where we know the device
> > should be accessible again.
> >
> > Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
> > Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
> > ---
> > drivers/nvme/host/pci.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 193b44755662..7c0c61b74c30 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -3399,6 +3399,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
> > dev_info(dev->ctrl.device, "restart after slot reset\n");
> > pci_restore_state(pdev);
> > nvme_reset_ctrl(&dev->ctrl);
> > + nvme_start_freeze(&dev->ctrl);
> > return PCI_ERS_RESULT_RECOVERED;
> > }
> >
> I am not sure if that's the right fix.
> From your description the hang occurs as nvme_reset_ctrl() is calling
> nvme_wait_freeze() without an corresponding nvme_start_freeze().
> So why are you calling it _after_ the call to nvme_reset_ctrl()?
>
> Cheers,
>
> Hannes

Hmm, the call chain that used to have the nvme_start_freeze()
is nvme_error_detected()->nvme_dev_disable()->nvme_start_freeze().

With the referenced commit that nvme_start_freeze() no longer happens
because the nvme_error_detected callback occurs before the reset when
the device is still inaccessible (as mentioned Step 1 in
Documentation/PCI/pci-error-recovery.rst).

There is indeed another nvme_dev_disable() call in nvme_reset_work()
but in the nvme_slot_reset() path this comes before
pci_enable_device_mem() was called so that also doesn't do the
nvme_start_freeze(). I also tried doing the nvme_start_freeze() there
but at least in my test that broke /sys/bus/pci/devices/<dev>/reset,
though not entirely sure why since nvme_start_freeze() looks like a no-
op when done a second time. I'm also not sure though what the right
approach is here though.