Re: [PATCH v2 0/3] x86/PCI: Log E820 clipping

From: Hans de Goede
Date: Thu May 05 2022 - 11:03:40 EST


Hi,

On 5/2/22 22:32, Bjorn Helgaas wrote:
> On Mon, May 02, 2022 at 02:24:26PM +0200, Hans de Goede wrote:
>> On 4/19/22 18:45, Bjorn Helgaas wrote:
>>> On Tue, Apr 19, 2022 at 05:16:44PM +0200, Hans de Goede wrote:
>>>> On 4/19/22 17:03, Bjorn Helgaas wrote:
>>>>> On Tue, Apr 19, 2022 at 11:59:17AM +0200, Hans de Goede wrote:
>
>>>>>> So what is the plan to actually fix the issue seen on some
>>>>>> Lenovo models and Clevo Barebones ? As I mentioned previously
>>>>>> I think that since all our efforts have failed so far that we
>>>>>> should maybe reconsider just using DMI quirks to ignore the
>>>>>> E820 reservation windows for host bridges on affected models ?
>>>>>
>>>>> I have been resisting DMI quirks but I'm afraid there's no other
>>>>> way.
>>>>
>>>> Well there is the first match adjacent windows returned by _CRS
>>>> and only then do the "covers whole region" exception check. I
>>>> still think that would work at least for the chromebook
>>>> regression...
>>>
>>> Without a crystal clear strategy, I think we're going to be
>>> tweaking the algorithm forever as the _CRS/E820 mix changes.
>>> That's why I think that in the long term, a "use _CRS only, with
>>> quirks for exceptions" strategy will be simplest.
>>
>> Looking at the amount of exception we already now about I'm not sure
>> if that will work well.
>
> It's possible that many quirks will be required. But I think in the
> long run the value of the simplest, most obvious strategy is huge.
> It's laid out in the spec already and it's the clearest way to
> agreement between firmware and OS. When we trip over something, it's
> very easy to determine whether _CRS is wrong or Linux is using it
> wrong. If we have to bring in question of looking at E820 entries,
> possibly merging them, using them or not based on overlaps ... that's
> a much more difficult conversation without a clear resolution.
>
>>> So I think we should go ahead with DMI quirks instead of trying to
>>> make the algorithm smarter, and yes, I think we will need commandline
>>> arguments, probably one to force E820 clipping for future machines,
>>> and one to disable it for old machines.
>>
>> So what you are suggesting is to go back to a bios-date based approach
>> (to determine old vs new machines) combined with DMI quirks to force
>> E820 clipping on new machines which turn out to need it despite them
>> being new ?
>
> Yes. It's ugly but I think the 10-year outlook is better.

Ok, I've brushed off one of my earlier patches doing this and
added DMI quirks for the "Lenovo X1 Carbon 2nd gen" suspend
issue + the Asus C523NA / Google Coral Chromebook not booting
issues which we already know will get triggered by this based
on earlier testing.

I'll send this out right after this email.

Note this new patch is based on top of your:

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/resource

>> I have the feeling that if we switch to top-down allocating
>> that we can then switch to just using _CRS and that everything
>> will then just work, because we then match what Windows is doing...
>
> Yes, it might. But I'm not 100% comfortable because it basically
> sweeps _CRS bugs under the rug, and we may trip over them as we do
> more hotplug and (eventually) resource rebalancing. I think we need
> to work toward getting _CRS more reliable.

Ok.

Regards,

Hans