Re: Linux 4.15-rc2: Regression in resume from ACPI S3

From: Thomas Gleixner
Date: Wed Dec 13 2017 - 10:58:07 EST


So I was finally able to figure out what the hell is going on:

Suspend:

- The device suspend code puts the graphics card into a power
state != PCI_D0.

- Offline non boot CPUs

- Break interrupt affinity. Allocate new vector on CPU 0, compose and
write MSI message which ends up in:

__pci_write_msi_msg(entry, msg)
{
if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
/* Don't touch the hardware now */
} else {
....
}
entry->msg = *msg;
}

So because the device is not in PCI_D0 the message is not written. It's
written in the device resume path.

Resume:
[ 139.670446] ACPI: Low-level resume complete
[ 139.670541] PM: Restoring platform NVS memory
[ 139.672462] do_IRQ: 0.55 No irq handler for vector
[ 139.672475] Enabling non-boot CPUs ...

So the spurious interrupt happens early and way before the device resume
code writes the new MSI message.

I checked the behaviour on 4.14. The MSI write is delayed there in the same
way, but there is no spurious interrupt. There is no interrupt coming in at
all _BEFORE_ the device is put out of PCI_D0.

And this has certainly nothing to do with the vector management changes,
but I can't figure yet what makes that spurious interrupt to be sent.

Any ideas welcome.

Thanks,

tglx