RE: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to remove the dependency on ghes

From: Elliott, Robert (Servers)
Date: Fri Aug 26 2022 - 18:43:29 EST




> -----Original Message-----
> From: Jia He <justin.he@xxxxxxx>
> Sent: Monday, August 22, 2022 10:41 AM
> Subject: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module to
> remove the dependency on ghes

1. I suggest adding:
MODULE_ALIAS("acpi*")

so udev will automatically load the module on any system with ACPI.

> drivers/edac/Kconfig
> config EDAC_GHES
> + tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC"

2. I suggest:
tristate "APEI (ACPI Platform Error Interfaces) GHES (Generic Hardware Error Source)"

That's in a menu of EDAC drivers, so no suffix is needed.

3. The Kconfig help text needs some updates, since the drivers are now ordering
themselves to avoid race conditions.

Current:
Not all machines support hardware-driven error report. Some of those
provide a BIOS-driven error report mechanism via ACPI, using the
APEI/GHES driver. By enabling this option, the error reports provided
by GHES are sent to userspace via the EDAC API.

When this option is enabled, it will disable the hardware-driven
mechanisms, if a GHES BIOS is detected, entering into the
"Firmware First" mode.

It should be noticed that keeping both GHES and a hardware-driven
error mechanism won't work well, as BIOS will race with OS, while
reading the error registers. So, if you want to not use "Firmware
first" GHES error mechanism, you should disable GHES either at
compilation time or by passing "ghes.disable=1" Kernel parameter
at boot time.

In doubt, say 'Y'.

Suggestion:
Support for error detection and correction based on APEI (ACPI Platform
Error Interfaces), which allows system firmware to report hardware errors
via the HEST (Hardware Error Source Table) using GHES (Generic Hardware
Error Source) records. Some systems perform "firmware first" processing
of errors before reporting them.

This module is supported in systems supporting GHES. If the architecture
is x86, the module only loads if the platform is listed in a known-good
platform list (see drivers/edac/ghes_edac.c) or if ghes.force_load=1
is specified on the kernel command line).


4. In the help text for each module that looks for GHES and refuses to load
(e.g., EDAC_AMD64), add a sentence:

This module does not load on a system supporting ACPI GHES.

> drivers/acpi/apei/ghes.c
> +MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC");

5. I suggest:
MODULE_DESCRIPTION("APEI (ACPI Platform Error Interfaces) GHES (Generic Hardware Error Source) EDAC driver")