Re: [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure

From: PJ Waskiewicz
Date: Wed May 01 2024 - 11:28:32 EST


On Mon, 2024-04-29 at 11:35 -0700, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Sun, Apr 28, 2024 at 10:57:13PM -0700, PJ Waskiewicz wrote:
> > > On Tue, 2024-04-09 at 08:22 -0500, Bjorn Helgaas wrote:
> > > > On Sun, Apr 07, 2024 at 02:05:26PM -0700,
> > > > ppwaskie@xxxxxxxxxx wrote:
> > > > > From: PJ Waskiewicz <ppwaskie@xxxxxxxxxx>
> > > > >
> > > > > Currently, Type 3 CXL devices (CXL.mem) can train using host
> > > > > CXL
> > > > > drivers on Emerald Rapids systems.  However, on some
> > > > > production
> > > > > systems from some vendors, a buggy BIOS exists that
> > > > > improperly
> > > > > populates the ACPI => PCI mappings.
> > > >
> > > > Can you be more specific about what this ACPI => PCI mapping
> > > > is?
> > > > If you already know what the problem is, I'm sure this is
> > > > obvious,
> > > > but otherwise it's not.
> [..]
> > It's just a buggy BIOS that doesn't supply _UID for an ACPI0016
> > object, so you can't locate the corresponding CEDT entry, right?
>
> Correct, the problem is 100% contained to ACPI, and PCI is innocent.
> The
> ACPI bug leads to failures to associate ACPI host-bridge objects with
> CEDT.CHBS entries.

Sorry for the confusion here!! I was definitely not trying to blame
PCI. :)

>
> ACPI to PCI association is then typical pci_root lookup, i.e.:
>
>         pci_root = acpi_pci_find_root(hb->handle);
>         bridge = pci_root->bus->bridge;

Yes, this here. In my use case, I'm starting with a PCIe/CXL device.
In my driver, I try to discover the host bridge, and then the ACPI _UID
so I can look things up in the CEDT.

So I'm trying to do the programmatic equivalent of this:

Start here in my PCIe/CXL host driver:

/sys/devices/pci0000:37/firmware_node =>
./LNXSYSTM:00/LNXSYBUS:00/ACPI0016:02

Retrieve _UID (uid) from /sys/devices/pci0000:37/firmware_node/uid

Buggy BIOS, that above value resolves to CX02. In fact, it *should* be
49. This is very much a bug in the ACPI arena.

The kernel APIs allowing me to walk this path would fail in the
acpi_evaluate_object() when trying to pass in the bad _UID (CX02).

Again, sorry for the confusion if it looked like I was trying to
implicate PCI in any way. The whole intent here was to leave some
breadcrumbs so anyone else running into this wouldn't be left
scratching their heads wondering wtf was going on.

Cheers,
-PJ