Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER reporting paths

From: Karolina Stolarek
Date: Fri Apr 25 2025 - 06:35:10 EST


On 24/04/2025 19:28, Bjorn Helgaas wrote:
[+to Yijun @Dell in case there's some testing opportunity, thread at
https://lore.kernel.org/r/81c040d54209627de2d8b150822636b415834c7f.1742900213.git.karolina.stolarek@xxxxxxxxxx]

On Thu, Apr 24, 2025 at 11:01:11AM +0200, Karolina Stolarek wrote:
>>
The only way to inject GHES errors I'm aware of is Mauro's patch for
qemu[1], so I went down the virtualization path. As for working with the
actual hardware, I'd need to ask around and learn more about the platform.

I'd be surprised if the qemu firmware supports firmware-first
handling, so I wouldn't expect to be able to exercise this path that
way. I think there are some bits in HEST and similar tables that tell
us about this, e.g., ACPI r6.5, sec 18.3.2.4.

It's possible that some of the nuances of this escaped me. I decided to pick up the series, as I saw "PCI Express bus error injection via GHES" script and thought it might be useful.

Unfortunately there are some typos in the spec (FIRMWARE_FIRST,
FIRMWAREFIRST in 18.4), so it's a little hard to find all the
references.

Thanks for the pointers, I'll take a look.

It's a long shot, but I added Yijun as a Dell contact that who might
have a pointer to someone who could possibly test GHES logging on a
Dell box with and without your patch so we could have a concrete
comparison of the dmesg log differences.

Thank you very much. Let's see, maybe we'll get lucky :)

All the best,
Karolina


If you can't produce actual logs for comparison, I think we can take
info from a sample log somebody has posted and synthesize what the
changes would be after this patch.

I also found some logs at some point, mostly from 2021 and 2023, but I felt
bad about mocking up the messages and tried to produce actual logs. If I
can't find a way to get this working in two weeks, I'll revisit this idea.

All the best,
Karolina

-------------------------------------------------------------
[1] - https://lore.kernel.org/lkml/76824dfc6bb5dd23a9f04607a907ac4ccf7cb147.1740653898.git.mchehab+huawei@xxxxxxxxxx/