Re: [PATCH v2] PCI: Fix no-op wait after secondary bus reset

From: Sheng Bi
Date: Sat May 21 2022 - 04:36:31 EST


On Fri, May 20, 2022 at 2:41 PM Lukas Wunner <lukas@xxxxxxxxx> wrote:
>
> On Wed, May 18, 2022 at 07:54:32PM +0800, Sheng Bi wrote:
> > +static int pci_bridge_secondary_bus_wait(struct pci_dev *bridge, int timeout)
> > +{
> > + struct pci_dev *dev;
> > + int delay = 0;
> > +
> > + if (!bridge->subordinate || list_empty(&bridge->subordinate->devices))
> > + return 0;
> > +
> > + list_for_each_entry(dev, &bridge->subordinate->devices, bus_list) {
> > + while (!pci_device_is_present(dev)) {
> > + if (delay > timeout) {
> > + pci_warn(dev, "not ready %dms after secondary bus reset; giving up\n",
> > + delay);
> > + return -ENOTTY;
> > + }
> > +
> > + msleep(20);
> > + delay += 20;
> > + }
> > +
> > + if (delay > 1000)
> > + pci_info(dev, "ready %dms after secondary bus reset\n",
> > + delay);
> > + }
> > +
> > + return 0;
> > +}
>
> An alternative approach you may want to consider is to call
> pci_dev_wait() in the list_for_each_entry loop, but instead of
> passing it a constant timeout you'd pass the remaining time.
>
> Get the current time before and after each pci_dev_wait() call
> from "jiffies", calculate the difference, convert to msecs with
> jiffies_to_msecs() and subtract from the "timeout" parameter
> passed in by the caller, then simply pass "timeout" to each
> pci_dev_wait() call.

Thanks for your proposal, which can avoid doing duplicated things as
pci_dev_wait().

If so, I also want to align the polling things mentioned in the
question from Alex, since pci_dev_wait() is also used for reset
functions other than SBR. To Bjorn, Alex, Lucas, how do you think if
we need to change the polling in pci_dev_wait() to 20ms intervals, or
keep binary exponential back-off with probable unexpected extra
timeout delay.

>
> As a side note, traversing the bus list normally requires
> holding the pci_bus_sem for reading. But it's probably unlikely
> that devices are added/removed concurrently to a bus reset
> and we're doing it wrong pretty much everywhere in the
> PCI reset code, so...

Yeah... I think that is why I saw different coding there. I would
prefer a separate thread for estimating which ones are real risks.

>
> (I fixed up one of the reset functions with 10791141a6cf,
> but plenty of others remain...)
>
> Thanks,
>
> Lukas