Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown

From: Bjorn Helgaas
Date: Fri May 25 2018 - 07:00:46 EST


On Wed, May 23, 2018 at 06:57:18PM -0400, Sinan Kaya wrote:
> On 5/23/2018 5:32 PM, Bjorn Helgaas wrote:
> >
> > The crash seems to indicate that the hpsa device attempted a DMA after
> > we cleared the Root Port's PCI_COMMAND_MASTER, which means
> > hpsa_shutdown() didn't stop DMA from the device (it looks like *most*
> > shutdown methods don't disable device DMA, so it's in good company).
>
> All drivers are expected to shutdown DMA and interrupts in their shutdown()
> routines. They can skip removing threads, data structures etc. but DMA and
> interrupt disabling are required. This is the difference between shutdown()
> and remove() callbacks.
>
> If you see that this is not being done in HPSA, then that is where the
> bugfix should be.
>
> Counter argument is that if shutdown() is not implemented, at least remove()
> should be called. Expecting all drivers to implement shutdown() callbacks
> is just bad by design in my opinion.
>
> Code should have fallen back to remove() if shutdown() doesn't exist.
> I can propose a patch for this but this is yet another story to chase.

That sounds like a reasonable idea, and it is definitely another can
of worms. I looked briefly at some of the .shutdown() cases:

- device_shutdown() doesn't fall back to remove().

- It looks like most bus_types don't implement .shutdown() at all (I
didn't look at them all).

- Of the bus_types that do implement .shutdown(), most do not fall
back to .remove(). ps3_system_bus_type() is an example of one
that *does* fall back to a driver's .remove() if there is no
.shutdown().

Implement shutdown (no fallback unless indicated):

ecard_bus_type
gio_bus_type
ps3_system_bus_type # does fallback to remove
ibmebus_bus_type
isa_bus_type
platform_bus_type # not direct implementation
fmc_bus_type # fmc_shutdown() looks spurious
mipi_dsi_bus_type
hv_bus

> >> This has been found to cause crashes on HP DL360 Gen9 machines during
> >> reboot. Besides, kexec is already clearing the bus master bit in
> >> pci_device_shutdown() after all PCI drivers are removed.
> >
> > The original path was:
> >
> > pci_device_shutdown(hpsa)
> > drv->shutdown
> > hpsa_shutdown # hpsa_pci_driver.shutdown
> > ...
> > pci_device_shutdown(RP) # root port
> > drv->shutdown
> > pcie_portdrv_remove # pcie_portdriver.shutdown
> > pcie_port_device_remove
> > pci_disable_device
> > do_pci_disable_device
> > # clear RP PCI_COMMAND_MASTER
> > if (kexec)
> > pci_clear_master(RP)
> > # clear RP PCI_COMMAND_MASTER
> >
> > If I understand correctly, the new path after this patch is:
> >
> > pci_device_shutdown(hpsa)
> > drv->shutdown
> > hpsa_shutdown # hpsa_pci_driver.shutdown
> > ...
> > pci_device_shutdown(RP) # root port
> > drv->shutdown
> > pcie_portdrv_shutdown # pcie_portdriver.shutdown
> > __pcie_portdrv_remove(RP, false)
> > pcie_port_device_remove(RP, false)
> > # do NOT clear RP PCI_COMMAND_MASTER
>
> yup
>
> > if (kexec)
> > pci_clear_master(RP)
> > # clear RP PCI_COMMAND_MASTER
> >
> > I guess this patch avoids the panic during reboot because we're not in
> > the kexec path, so we never clear PCI_COMMAND_MASTER for the Root
> > Port, so the hpsa device can DMA happily until the lights go out.
> >
> > But DMA continuing for some random amount of time before the reboot or
> > shutdown happens makes me a little queasy. That doesn't sound safe.
> > The more I think about this, the more confused I get. What am I
> > missing?
>
> see above.
>
> >
> >> Just remove the extra clear in shutdown path by seperating the remove and
> >> shutdown APIs in the PORTDRV.
> >>
> >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> >> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = {
> >>
> >> .probe = pcie_portdrv_probe,
> >> .remove = pcie_portdrv_remove,
> >> - .shutdown = pcie_portdrv_remove,
> >> + .shutdown = pcie_portdrv_shutdown,
> >
> > What are the circumstances when we call .remove() vs .shutdown()?
> >
> > I guess the main (maybe only) way to call .remove() is to hot-remove
> > the port? And .shutdown() is basically used in the reboot and kexec
> > paths?
>
> Correct. shutdown() is only called during reboot/shutdown calls. If you echo
> 1 into the remove file, remove() gets called. Handy for hotplug use cases.
> It needs to be the exact opposite of the probe. It needs to clean up resources
> etc. and have the HW in a state where it can be reinitialized via probe again.
>
> >
> >> .err_handler = &pcie_portdrv_err_handler,
> >>
> >> --
> >> 2.7.4
> >>
> >
>
>
> --
> Sinan Kaya
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.