[PATCH v2 0/7] New way to track mce notifier chain actions

From: Tony Luck
Date: Fri Feb 14 2020 - 17:28:39 EST


Parts 1 & 2 are just cleanup. CEC should follow the same rules
as everyone else who wants to be on the mce notifier chain. No
real reason for it to have direct hooks into mce/core.c
[No substantive change since RFC version 1, but note that
I have kept the change to make CEC a "normal" user of
the mce notifier chain. Result is a few checks for
if (mce->kflags & MCE_HANDLED_CEC) in EDAC etc. drivers.]

Part 3 adds a field to struct mce, and defines the BIT fields
for each class of notifier. All EDAC drivers share the same BIT
since only one of them should be active.
[Boris: Changed name of new field to "kflags" and made
it __u64, so plenty of space for possible future
other uses]

Part 4 Re-done since draft based on Luto and Tglx comments that
we should kill of all usage of NOTIFY_STOP. This patch
now gets rid of all but one. That's an AMD case where
it looks like they don't want to decode some particular
errors on a specific platform. The right fix for that
is to take Luto's advice and filter out before that item
gets to the notifier chain. We even already have a filter
function (filter_mce) to do that! But that change needs
to be handled by someone with the appropriate h/w.

Part 5 Now just checks for mce->kflags in the default handler at
the end of the chain to decide whether to print.

Part 6 NEW - add mce=print_all option to override default and
print everything to the console. Intended for debug, or
desperation scenarios where other logs are lost.

Part 7 NEW - Delete the code that tries to make sure only one
out of acpi_extlog and the current loaded EDAC driver
deals with an error.


Tony Luck (7):
x86/mce: Rename "first" function as "early"
x86/mce: Convert corrected error collector to use mce notifier
x86/mce: Add new "kflags" field to "struct mce"
x86/mce: Fix all mce notifiers to update the mce->kflags bitmask
x86/mce: Change default mce logger to check mce->kflags
x86/mce: Add mce=print_all option
x86/mce: Drop the EDAC report status checks

arch/x86/include/asm/mce.h | 15 +++----
arch/x86/include/uapi/asm/mce.h | 9 ++++
arch/x86/kernel/cpu/mce/core.c | 58 ++++++++------------------
arch/x86/kernel/cpu/mce/dev-mcelog.c | 5 +++
arch/x86/kernel/cpu/mce/internal.h | 1 +
drivers/acpi/acpi_extlog.c | 19 ++-------
drivers/acpi/nfit/mce.c | 1 +
drivers/edac/edac_mc.c | 61 ----------------------------
drivers/edac/i7core_edac.c | 5 ++-
drivers/edac/mce_amd.c | 9 +++-
drivers/edac/pnd2_edac.c | 8 ++--
drivers/edac/sb_edac.c | 7 ++--
drivers/edac/skx_common.c | 3 +-
drivers/ras/cec.c | 29 +++++++++++++
include/linux/edac.h | 8 ----
15 files changed, 91 insertions(+), 147 deletions(-)


base-commit: b19e8c68470385dd2c5440876591fddb02c8c402
--
2.21.1