[PATCH] mce: include cmci during intel feature clearing
From: JP Kobryn
Date: Tue Jun 17 2025 - 17:48:24 EST
It was found that after a kexec on an intel CPU, MCE reporting was no
longer active. The root cause has been found to be that ownership of CMCI
banks is not cleared during the shutdown phase. As a result, when CPU's
come back online, they are unable to rediscover these occupied banks. If we
clear these CPU associations before booting into the new kernel, the CMCI
banks can be reclaimed and MCE reporting will become functional once more.
The existing code does seem to have the intention of clearing MCE-related
features via mcheck_cpu_clear(). During a kexec reboot, there are two
sequences that reach a call to mcheck_cpu_clear(). They are:
1) Stopping other (remote) CPU's via IPI:
native_machine_shutdown()
stop_other_cpus()
smp_ops.stop_other_cpus(1)
x86 smp: native_stop_other_cpus(1)
apic_send_IPI_allbutself(REBOOT_VECTOR)
...IPI is received on remote CPU's and IDT sysvec_reboot invoked:
stop_this_cpu()
mcheck_cpu_clear(this_ptr_cpu(&cpu_info))
2) Seqence of stopping the active CPU (the one performing the kexec):
native_machine_shutdown()
stop_other_cpus()
smp_ops.stop_other_cpus(1)
x86 smp: native_stop_other_cpus(1)
mcheck_cpu_clear(this_ptr_cpu(&cpu_info))
In both cases, the call to mcheck_cpu_clear() leads to the vendor specific
call to intel_feature_clear():
mcheck_cpu_clear(this_ptr_cpu(&cpu_info))
__mcheck_cpu_clear_vendor(c)
switch (c->x86_vendor)
case X86_VENDOR_INTEL:
mce_intel_feature_clear(c)
Now looking at the pair of functions mce_intel_feature_{init,clear}, there
are 3 MCE features setup on the init side:
mce_intel_feature_init(c)
intel_init_cmci()
intel_init_lmce()
intel_imc_init(c)
On the other side in the clear function, only one of these features is
actually cleared:
mce_intel_feature_clear(c)
intel_clear_lmce()
Just focusing on the feature pertaining to the root cause of the kexec
issue, there would be a benefit if we additionally cleared the CMCI feature
within this routine - the banks would be free for acquisition on the boot
up side of a kexec. This patch adds the call to clear CMCI to this intel
routine.
Reported-by: Aijay Adams <aijay@xxxxxxxx>
Signed-off-by: JP Kobryn <inwardvessel@xxxxxxxxx>
---
arch/x86/kernel/cpu/mce/intel.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index efcf21e9552e..9b149b9c4109 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -478,6 +478,7 @@ void mce_intel_feature_init(struct cpuinfo_x86 *c)
void mce_intel_feature_clear(struct cpuinfo_x86 *c)
{
intel_clear_lmce();
+ cmci_clear();
}
bool intel_filter_mce(struct mce *m)
--
2.47.1