Re: [PATCH v7 1/1] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

From: Hans de Goede
Date: Sat May 07 2022 - 06:09:24 EST


Hi Bjorn,

On 5/6/22 18:51, Bjorn Helgaas wrote:
> On Thu, May 05, 2022 at 05:20:16PM +0200, Hans de Goede wrote:
>> Some BIOS-es contain bugs where they add addresses which are already
>> used in some other manner to the PCI host bridge window returned by
>> the ACPI _CRS method. To avoid this Linux by default excludes
>> E820 reservations when allocating addresses since 2010, see:
>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>> space").
>>
>> Recently (2019) some systems have shown-up with E820 reservations which
>> cover the entire _CRS returned PCI bridge memory window, causing all
>> attempts to assign memory to PCI BARs which have not been setup by the
>> BIOS to fail. For example here are the relevant dmesg bits from a
>> Lenovo IdeaPad 3 15IIL 81WE:
>>
>> [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>> pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>>
>> The ACPI specifications appear to allow this new behavior:
>>
>> The relationship between E820 and ACPI _CRS is not really very clear.
>> ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
>>
>> This range of addresses is in use or reserved by the system and is
>> not to be included in the allocatable memory pool of the operating
>> system's memory manager.
>>
>> and it may be used when:
>>
>> The address range is in use by a memory-mapped system device.
>>
>> Furthermore, sec 15.2 says:
>>
>> Address ranges defined for baseboard memory-mapped I/O devices, such
>> as APICs, are returned as reserved.
>>
>> A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
>> and its apertures are in use and certainly should not be included in
>> the general allocatable pool, so the fact that some BIOS-es reports
>> the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
>>
>> So it seems that the excluding of E820 reserved addresses is a mistake.
>>
>> Ideally Linux would fully stop excluding E820 reserved addresses,
>> but then various old systems will regress.
>> Instead keep the old behavior for old systems, while ignoring
>> the E820 reservations for any systems from now on.
>>
>> Old systems are defined here as BIOS year < 2018, this was chosen to
>> make sure that pci_use_e820 will not be set on the currently affected
>> systems, the oldest known one is from 2019.
>>
>> Testing has shown that some newer systems also have a bad _CRS return.
>> The pci_crs_quirks DMI table is used to keep excluding E820 reservations
>> from the bridge window on these systems.
>>
>> Also add pci=no_e820 and pci=use_e820 options to allow overriding
>> the BIOS year + DMI matching logic.
>>
>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>> BugLink: https://bugs.launchpad.net/bugs/1878279
>> BugLink: https://bugs.launchpad.net/bugs/1931715
>> BugLink: https://bugs.launchpad.net/bugs/1932069
>> BugLink: https://bugs.launchpad.net/bugs/1921649
>> Cc: Benoit Grégoire <benoitg@xxxxxxxx>
>> Cc: Hui Wang <hui.wang@xxxxxxxxxxxxx>
>> Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
>
>> + * Ideally Linux would fully stop using E820 reservations, but then
>> + * various old systems will regress. Instead keep the old behavior for
>> + * old systems + known to be broken newer systems in pci_crs_quirks.
>> + */
>> + if (year >= 0 && year < 2018)
>> + pci_use_e820 = true;
>
> How did you pick 2018? Prior to this patch, we used E820 reservations
> for all machines. This patch would change that for 2019-2022
> machines, so there's a risk of breaking some of them.

Correct. I picked 2018 because the first devices where using E820
reservations are causing issues (i2c controller not getting resources
leading to non working touchpad / thunderbolt hotplug issues) have
BIOS dates starting in 2019. I added a year margin, so we could make
this 2019.

> I'm hesitant about changing the behavior for machines already in the
> field because if they were tested at all with Linux, it was without
> this patch. So I would lean toward preserving the current behavior
> for BIOS year < 2023.

I see, I presume the idea is to then use DMI to disable E820 clipping
on current devices where this is known to cause problems ?

So for v8 I would:

1. Change the cut-off check to < 2023
2. Drop the DMI quirks I added for models which are known to need E820
clipping hit by the < 2018 check
3. Add DMI quirks for models for which it is known that we must _not_
do E820 clipping

Is this the direction you want to go / does that sound right?

Note the DMI list for 3. will initially very likely be incomplete, but
I can ask around for testing once we have settled on this approach
and do one or more follow up patches to extend the list.


>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index 9e1e6b8d8876..7e6f79aab6a8 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
>> } else if (!strcmp(str, "nocrs")) {
>> pci_probe |= PCI_ROOT_NO_CRS;
>> return NULL;
>> + } else if (!strcmp(str, "use_e820")) {
>> + pci_probe |= PCI_USE_E820;
>
> I think we should add_taint(TAINT_FIRMWARE_WORKAROUND) for both these
> cases.

Ok, I'll add this for v8.

>
> We probably should do it for *all* the parameters here, but that would
> be a separate discussion.
>
>> + return NULL;
>> + } else if (!strcmp(str, "no_e820")) {
>> + pci_probe |= PCI_NO_E820;
>> + return NULL;
>> #ifdef CONFIG_PHYS_ADDR_T_64BIT
>> } else if (!strcmp(str, "big_root_window")) {
>> pci_probe |= PCI_BIG_ROOT_WINDOW;
>> --
>> 2.36.0
>>
>


Regards,

Hans