Re: Linux 4.15-rc2: Regression in resume from ACPI S3

From: Bjorn Helgaas
Date: Wed Dec 13 2017 - 11:23:47 EST


[+cc linux-pci, linux-pm]

On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> So I was finally able to figure out what the hell is going on:
>
> Suspend:
>
> - The device suspend code puts the graphics card into a power
> state != PCI_D0.
>
> - Offline non boot CPUs
>
> - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> write MSI message which ends up in:
>
> __pci_write_msi_msg(entry, msg)
> {
> if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> /* Don't touch the hardware now */
> } else {
> ....
> }
> entry->msg = *msg;
> }
>
> So because the device is not in PCI_D0 the message is not written. It's
> written in the device resume path.

I'm not a PM guru, but this ordering seems fragile. If we offline
CPUs before re-targeting interrupts directed at those CPUs, aren't we
always going to be at risk of sending interrupts to an offline CPU?

Even if the device is now asleep and therefore should not generate an
interrupt, it seems like there's a window when the device returns to
PCI_D0 where it could generate an interrupt before we have a chance to
update the MSI message.

> Resume:
> [ 139.670446] ACPI: Low-level resume complete
> [ 139.670541] PM: Restoring platform NVS memory
> [ 139.672462] do_IRQ: 0.55 No irq handler for vector
> [ 139.672475] Enabling non-boot CPUs ...
>
> So the spurious interrupt happens early and way before the device resume
> code writes the new MSI message.
>
> I checked the behaviour on 4.14. The MSI write is delayed there in the same
> way, but there is no spurious interrupt. There is no interrupt coming in at
> all _BEFORE_ the device is put out of PCI_D0.
>
> And this has certainly nothing to do with the vector management changes,
> but I can't figure yet what makes that spurious interrupt to be sent.
>
> Any ideas welcome.
>
> Thanks,
>
> tglx
>