RE: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

From: Dexuan Cui
Date: Tue Oct 08 2019 - 20:16:23 EST


> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> Sent: Tuesday, October 8, 2019 12:56 PM
> ...
> Wordsmithing nit: what the patch does is not "fix the error message";
> what it does is fix the *problem*, i.e., the fact that we can't
> operate the device because we can't enable MSI-X. The message is only
> a symptom.

I totally agree. :-)

> IIUC the relevant part of the system hibernation sequence is:
>
> pci_pm_freeze_noirq
> pci_pm_thaw_noirq
> pci_pm_thaw
>
> And the execution flow is:
>
> pci_pm_freeze_noirq
> if (pci_has_legacy_pm_support(pci_dev)) # true for mlx4
> pci_legacy_suspend_late(dev, PMSG_FREEZE)
> pci_pm_set_unknown_state
> dev->current_state = PCI_UNKNOWN # <---
> pci_pm_thaw_noirq
> if (pci_has_legacy_pm_support(pci_dev)) # true
> pci_legacy_resume_early(dev) # noop; mlx4 doesn't
> implement
> pci_pm_thaw # returns -95
> EOPNOTSUPP
> if (pci_has_legacy_pm_support(pci_dev)) # true
> pci_legacy_resume
> drv->resume
> mlx4_resume # mlx4_driver.resume (legacy)
> mlx4_load_one
> mlx4_enable_msi_x
> pci_enable_msix_range
> __pci_enable_msix_range
> __pci_enable_msix
> if (!pci_msi_supported())
> if (dev->current_state != PCI_D0) # <---
> return 0
> return -EINVAL
> err = -EOPNOTSUPP
> "INTx is not supported ..."
>
> (These are just my notes; you don't need to put them all into the
> commit message. I'm just sharing them in case I'm not understanding
> correctly.)

Yes, these notes are accurate.

> > > > > When the system starts again, a fresh kernel starts to run, and when the
> > > > > kernel detects that a hibernation image was saved, the kernel
> "quiesces"
> > > > > the devices, and then "restores" the devices from the saved image. In
> this
> > > > > path:
> > > > > device_resume_noirq() -> ... ->
> > > > > pci_pm_restore_noirq() ->
> > > > > pci_pm_default_resume_early() ->
> > > > > pci_power_up() moves the device states back to PCI_D0. This
> path is
> > > > > not broken and doesn't need my patch.
> > > > >
>
> The cc list suggests that this might be a fix for a user-reported
> problem. Is there a launchpad or similar link you could include here?

I guess I'm the first one to notice the issue and there is not any bug link AFAIK.

The hibernation process usually saves the states into a local disk (before the
system is powered off), and the Mellanox NIC is not needed during the process,
so it's not a real issue that the NIC can not work between pci_pm_thaw() and
power_down(). This may explain why nobody else noticed the issue. I happened
to see the error message, and hence investigated the issue.

> Should this be marked for stable?

I think we should do it.

> > > > > --- a/drivers/pci/pci-driver.c
> > > > > +++ b/drivers/pci/pci-driver.c
> > > > > @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device
> > > > *dev)
> > > > > return error;
> > > > > }
> > > > >
> > > > > - if (pci_has_legacy_pm_support(pci_dev))
> > > > > - return pci_legacy_resume_early(dev);
> > > > > -
> > > > > /*
> > > > > * pci_restore_state() requires the device to be in D0 (because
> of MSI
> > > > > * restoration among other things), so force it into D0 in case
> the
> > > > > * driver's "freeze" callbacks put it into a low-power state
> directly.
> > > > > */
> > > > > pci_set_power_state(pci_dev, PCI_D0);
> > > > > +
> > > > > + if (pci_has_legacy_pm_support(pci_dev))
> > > > > + return pci_legacy_resume_early(dev);
> > > > > +
> > > > > pci_restore_state(pci_dev);
> > > > >
> > > > > if (drv && drv->pm && drv->pm->thaw_noirq)
> > > > > --
> > > > > 2.19.1
> > > > >
> > The patch looks reasonable to me, but the comment above the
> > pci_set_power_state() call needs to be updated too IMO.
>
> Hmm.
>
> 1) pci_restore_state() mainly writes config space, which doesn't
> require the device to be in D0. The only thing I see that would
> require D0 is the MSI-X MMIO space, so to be more specific, the
> comment could say "restoring the MSI-X *MMIO* state requires the
> device to be in D0".
>
> But I think you meant some other comment change. Did you mean
> something along the lines of "a legacy drv->resume_early() callback
> and pci_restore_state() both require the device to be in D0"?
>
> If something else, maybe you could propose some text?
>
> 2) I assume pci_pm_thaw_noirq() should leave the device in a
> functionally equivalent state, whether it uses legacy PM or not. Do
> we want something like the patch below instead? If we *do* want to
> skip pci_restore_state() for legacy PM, maybe we should add a comment.
>
> 3) Documentation/power/pci.rst says:
>
> ... devices have to be brought back to the fully functional
> state ...
>
> pci_pm_thaw_noirq() ... doesn't put the device into the full power
> state and doesn't attempt to restore its standard configuration
> registers.
>
> That doesn't seem consistent, and it looks like pci_pm_thaw_noirq()
> actually *does* put the device in full power (D0) state and restore
> config registers.

I would leave these questions to Rafael.

> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..30c721fd6bcf 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1068,7 +1068,7 @@ static int pci_pm_thaw_noirq(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> struct device_driver *drv = dev->driver;
> - int error = 0;
> + int error;
>
> if (pcibios_pm_ops.thaw_noirq) {
> error = pcibios_pm_ops.thaw_noirq(dev);
> @@ -1076,9 +1076,6 @@ static int pci_pm_thaw_noirq(struct device *dev)
> return error;
> }
>
> - if (pci_has_legacy_pm_support(pci_dev))
> - return pci_legacy_resume_early(dev);
> -
> /*
> * pci_restore_state() requires the device to be in D0 (because of MSI
> * restoration among other things), so force it into D0 in case the
> @@ -1087,10 +1084,13 @@ static int pci_pm_thaw_noirq(struct device *dev)
> pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> + if (pci_has_legacy_pm_support(pci_dev))
> + return pci_legacy_resume_early(dev);
> +
> if (drv && drv->pm && drv->pm->thaw_noirq)
> - error = drv->pm->thaw_noirq(dev);
> + return drv->pm->thaw_noirq(dev);
>
> - return error;
> + return 0;
> }
>
> static int pci_pm_thaw(struct device *dev)

The only real difference from my patch is that you moved

+ if (pci_has_legacy_pm_support(pci_dev))
+ return pci_legacy_resume_early(dev);

to after the line "pci_restore_state(pci_dev);"

This change is good to me, and shoud also resolve the error message I saw.

Thanks,
-- Dexuan