Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

From: Heiner Kallweit
Date: Fri May 29 2020 - 16:53:41 EST


On 29.05.2020 22:21, Bjorn Helgaas wrote:
> [+cc Matthew]
>
> On Fri, May 29, 2020 at 10:09:08PM +0200, Heiner Kallweit wrote:
>> On 29.05.2020 21:40, Heiner Kallweit wrote:
>>> On 29.05.2020 21:21, Bjorn Helgaas wrote:
>>>> On Fri, May 29, 2020 at 08:50:46PM +0200, Heiner Kallweit wrote:
>>>>> On 28.05.2020 23:44, Heiner Kallweit wrote:
>>>>>> For whatever reason with this change (and losing ASPM control) I also
>>>>>> loose the PCIe PME interrupts. This prevents my network card from
>>>>>> resuming from runtime-suspend.
>>>>>> Reverting the change brings back ASPM control and the PCIe PME irq's.
>>>>>>
>>>>>> Affected system is a Zotac MiniPC with a N3450 CPU:
>>>>>> PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 (rev fb)
>>>>>>
>>>>> I checked a little bit further and w/o ASPM control the root ports
>>>>> don't have the PME service bit set in their capabilities.
>>>>> Not sure whether this is a chipset bug or whether there's a better
>>>>> explanation. However more chipsets may have such a behavior.
>>>>
>>>> Hmm. Is the difference simply changing the PCIEASPM config symbol, or
>>>> are you booting with command-line arguments like "pcie_aspm=off"?
>>>>
>>> Only difference is the config symbol. My command line is plain and simple:
>>>
>>> Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root=/dev/sda2 rw
>>>
>>>> What's the specific PME bit that changes in the root ports? Can you
>>>> collect the "sudo lspci -vvxxxx" output with and without ASPM?
>>>>
>>>> The capability bits are generally read-only as far as the PCI spec is
>>>> concerned, but devices have implementation-specific knobs that the
>>>> BIOS may use to change things. Without CONFIG_PCIEASPM, Linux will
>>>> not request control of LTR, and that could cause the BIOS to change
>>>> something. You should be able to see the LTR control difference in
>>>> the dmesg logging about _OSC.
>>>>
>>>>> W/o the "default y" for ASPM control we also have the situation now
>>>>> that the config option description says "When in doubt, say Y."
>>>>> but it takes the EXPERT mode to enable it. This seems to be a little
>>>>> bit inconsistent.
>>>>
>>>> We should probably remove the "if EXPERT" from the PCIEASPM kconfig.
>>>> But I would expect PME to work correctly regardless of PCIEASPM, so
>>>> removing "if EXPERT" doesn't solve the underlying problem.
>>>>
>>>> Rafael, does this ring any bells for you? I don't remember a
>>>> connection between PME and ASPM, but maybe there is one.
>>>>
>>>>> To cut a long story short:
>>>>> At least on some systems this change has unwanted side effects.
>>>
>>> lspci output w/ and w/o ASPM is attached incl. a diff.
>>> Here comes the _OSC difference.
>>>
>>> w/o ASPM
>>>
>>> [ 0.386063] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig Segments MSI HPX-Type3]
>>> [ 0.386918] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
>>>
>>> w/ ASPM
>>> [ 0.388141] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
>>> [ 0.393648] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability LTR]
>>>
>>> It's at least interesting that w/o ASPM OS doesn't control PME and AER.
>>>
>>
>> This was the right entry point, also w/o ASPM control OS states to ACPI that it
>> needs ASPM and ClockPM. The following patch fixes the PME issue for me.
>> See also the _OSC part below.
>>
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 9e235c1a7..8df1fa728 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -38,10 +38,15 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
>> return 0;
>> }
>>
>> +#ifdef CONFIG_PCIEASPM
>> #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
>> | OSC_PCI_ASPM_SUPPORT \
>> | OSC_PCI_CLOCK_PM_SUPPORT \
>> | OSC_PCI_MSI_SUPPORT)
>> +#else
>> +#define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
>> + | OSC_PCI_MSI_SUPPORT)
>> +#endif
>
> Yeah, that makes sense. I can't remember the details, but I'm pretty
> sure there's a reason why we ask for the whole set of things. Seems
> like it solved some problem. I think Matthew Garrett might have been
> involved in that.
>

ACPI_PCIE_REQ_SUPPORT was introduced with 415e12b23792 ("PCI/ACPI:
Request _OSC control once for each root bridge (v3)") in 2011.
It's linked to the following bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=20232

I can't say whether there's a chance that the proposed patch would
bring back the issue discussed at that time.

If somebody thinks that the proposed patch isn't the right solution,
then presumably the following needs to be touched.

if ((support & ACPI_PCIE_REQ_SUPPORT) != ACPI_PCIE_REQ_SUPPORT) {
decode_osc_support(root, "not requesting OS control; OS requires",
ACPI_PCIE_REQ_SUPPORT);
return;
}

control = OSC_PCI_EXPRESS_CAPABILITY_CONTROL
| OSC_PCI_EXPRESS_PME_CONTROL;