[PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

From: Naoya Horiguchi
Date: Fri Feb 27 2015 - 00:02:54 EST


kexec disables (or "shoots down") all CPUs other than a crashing CPU before
entering the 2nd kernel. But the MCE handler is still enabled after that, so
if MCE happens and broadcasts around CPUs after the main thread starts the
2nd kernel (which might not start MCE yet, or might decide not to start MCE,)
MCE handler runs only on the other CPUs (not on the main thread,) leading to
kernel panic with MCE synchronization. The user-visible effect of this bug
is kdump failure.

Note that this problem exists since current MCE handler was implemented in
2.6.32, and recently commit 716079f66eac ("mce: Panic when a core has reached
a timeout") made it more visible by changing the default behavior of the
synchronization timeout from "ignore" to "panic".

This patch adds a global variable representing that the system is running
kdump code in order to "turn off" the MCE handling code in kdump context.

Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> [2.6.32+]
---
ChangeLog v1 -> v2
- clear MSR_IA32_MCG_CTL, MSR_IA32_MCx_CTL, and CR4.MCE instead of using
global flag to ignore MCE events.
- fixed the description of the problem
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce.c | 17 +++++++++++++++++
arch/x86/kernel/crash.c | 8 ++++++++
3 files changed, 26 insertions(+)

diff --git v3.19.orig/arch/x86/include/asm/mce.h v3.19/arch/x86/include/asm/mce.h
index 51b26e895933..7ae9927d781a 100644
--- v3.19.orig/arch/x86/include/asm/mce.h
+++ v3.19/arch/x86/include/asm/mce.h
@@ -175,6 +175,7 @@ static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
#endif

int mce_available(struct cpuinfo_x86 *c);
+void cpu_emergency_mce_disable(void);

DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c v3.19/arch/x86/kernel/cpu/mcheck/mce.c
index 3112b79ace8e..10359ae1f558 100644
--- v3.19.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ v3.19/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2105,6 +2105,23 @@ static void mce_syscore_shutdown(void)
}

/*
+ * Called in kdump entering code to turn off MCE handling function. We clear
+ * global switch first to forbid the situation where only portion of CPUs are
+ * responsive to MCE and MCE causes kernel panic with synchronization timeout.
+ */
+void cpu_emergency_mce_disable(void)
+{
+ u64 cap;
+ int i;
+
+ rdmsrl(MSR_IA32_MCG_CAP, cap);
+ if (cap & MCG_CTL_P)
+ wrmsr(MSR_IA32_MCG_CTL, 0, 0);
+ mce_disable_error_reporting();
+ clear_in_cr4(X86_CR4_MCE);
+}
+
+/*
* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
* Only one CPU is active at this time, the others get re-added later using
* CPU hotplug:
diff --git v3.19.orig/arch/x86/kernel/crash.c v3.19/arch/x86/kernel/crash.c
index aceb2f90c716..22451c687fca 100644
--- v3.19.orig/arch/x86/kernel/crash.c
+++ v3.19/arch/x86/kernel/crash.c
@@ -34,6 +34,7 @@
#include <asm/cpu.h>
#include <asm/reboot.h>
#include <asm/virtext.h>
+#include <asm/mce.h>

/* Alignment required for elf header segment */
#define ELF_CORE_HEADER_ALIGN 4096
@@ -112,6 +113,8 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
#endif
crash_save_cpu(regs, cpu);

+ cpu_emergency_mce_disable();
+
/*
* VMCLEAR VMCSs loaded on all cpus if needed.
*/
@@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();

+ /*
+ * We can't expect MCE handling to work any more, so turn it off.
+ */
+ cpu_emergency_mce_disable();
+
kdump_nmi_shootdown_cpus();

/*
--
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/