Re: [PATCH] pci: Disable slot presence detection around bus reset

From: Alex Williamson
Date: Thu Feb 14 2013 - 22:54:14 EST


On Thu, 2013-02-14 at 16:47 -0700, Bjorn Helgaas wrote:
> On Thu, Feb 14, 2013 at 11:37 AM, Alex Williamson
> <alex.williamson@xxxxxxxxxx> wrote:
> > A bus reset can trigger a presence detection change and result in a
> > suprise hotplug. This is generally not what we want to happen when
> > trying to reset a device. Disable the presence detection control on
> > on bridges around bus reset.
>
> This is a really interesting situation, and I'm not quite ready to
> sign up to the idea that this is really a problem and that if it is,
> this is the way we want to fix it.
>
> What would happen if we *did* handle this as a hotplug event, with a
> removal followed by an add?
>
> The scheme where pci_reset_function() does "pci_save_state(dev);
> pci_dev_reset(dev); pci_restore_state(dev);" makes me nervous.
>
> We're saving and restoring some of PCI config space around the reset,
> but there's no guarantee that we're preserving *all* the important
> state in config space because I think devices can have non-architected
> device-specific things in config space that we don't know how to
> save/restore.
>
> Devices also have internal state not exposed via config space. That
> state is lost during the reset but can't be restored by
> pci_restore_state(). So it seems like pci_reset_function() is
> pretending to do something it can't really do reliably.
>
> If we make it so a reset is always handled as a remove+add, then we'll
> use a more generic path, and we'll get all the stuff you expect when
> initializing a new device -- resource assignment, IRQ setup, quirks,
> etc. Quirks in particular seem like something we want, but don't
> currently get with pci_reset_function().
>
> Oh, and the "disable presence detect" approach below only works for
> things below a PCIe bridge with native hotplug, right? I wonder what
> happens if we reset devices below a bridge using SHPC or acpiphp.

Triggering a remove+add is not useful for the way we use it today. The
users I'm aware of are KVM device assignment and VFIO, where we trigger
it in an attempt to get the device to a known state so that we have some
hope of repeatability. In those scenarios the reset is initiated by the
driver. The interface isn't meant to guarantee the device is returned
to an identical state as it was before reset. If it did, why would we
call it? We want to get to a state as near to power on, but still with
config programming, as we can.

Being driver directed, having the reset initiate a remove is pretty near
the last thing we want. That limits the scope of calling it to only
when the driver can readily release the device. If we have the device
attached to a guest or userspace driver, that's potentially a lot of
setup and teardown and effectively extending a surprise removal all the
way up the stack.

Obviously a bus reset is a big hammer and we do exhaust all the little
hammers of flr and pm reset before we try it, but in this case, we know
the device that's going away and with all likelihood, it's coming right
back at the same location. If we take the path of forcing a remove+add,
let's just remove it from the reset_function call path and we'll do
without reset for those devices. Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > ---
> > drivers/pci/pci.c | 29 ++++++++++++++++++++++++-----
> > 1 file changed, 24 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 5cb5820..c1f7d77 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3229,8 +3229,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
> >
> > static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
> > {
> > - u16 ctrl;
> > - struct pci_dev *pdev;
> > + u16 ctrl, flags, sltctl = 0;
> > + struct pci_dev *pdev, *bridge;
> >
> > if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
> > return -ENOTTY;
> > @@ -3242,15 +3242,34 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
> > if (probe)
> > return 0;
> >
> > - pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
> > + bridge = dev->bus->self;
> > +
> > + /*
> > + * If the parent device supports a slot with presence detection
> > + * change enabled, holding the bus in reset can trigger that and
> > + * cause an unwanted surprise removal. Disable presence detection
> > + * around the bus reset.
> > + */
> > + pcie_capability_read_word(bridge, PCI_EXP_FLAGS, &flags);
> > + if (flags & PCI_EXP_FLAGS_SLOT) {
> > + pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &sltctl);
> > + if (sltctl & PCI_EXP_SLTCTL_PDCE)
> > + pcie_capability_write_word(bridge, PCI_EXP_SLTCTL,
> > + sltctl & ~PCI_EXP_SLTCTL_PDCE);
> > + }
> > +
> > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &ctrl);
> > ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> > - pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> > + pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
> > msleep(100);
> >
> > ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
> > - pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
> > + pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, ctrl);
> > msleep(100);
> >
> > + if (sltctl & PCI_EXP_SLTCTL_PDCE)
> > + pcie_capability_write_word(bridge, PCI_EXP_SLTCTL, sltctl);
> > +
> > return 0;
> > }
> >
> >



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/