Re: [PATCH -v2] x86/boot/compressed: Register dummy NMI handler in EFI boot loader, to avoid kdump crashes

From: Zeng Heng
Date: Tue Jan 10 2023 - 08:43:14 EST



On 2023/1/10 20:57, Borislav Petkov wrote:
On Tue, Jan 10, 2023 at 08:32:07PM +0800, Zeng Heng wrote:
mce is registered on NMI handler by inject_init().
That's a handler for the NMI raised by raise_mce(). That's for the injection
case, which is simulated. If you're fixing the injection case, then surely not
with a bogus boot NMI handler.

OK, mce-injection is the simulated one.


Yes, exactly. The following procedure is like:

panic() -> relocate_kernel() -> identity_mapped() -> x86 purgatory image ->
EFI loader -> secondary kernel
I'm doubtful now as you're injecting errors so you're not really in #MC context
but in this contrived context which is actually an NMI one. So we need to think
about how to fix this case.

Certainly not with an empty NMI handler...

Regardless, we should do

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7832a69d170e..57fe376ed049 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -286,6 +286,8 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
if (!fake_panic) {
if (panic_timeout == 0)
panic_timeout = mca_cfg.panic_timeout;
+
+ mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
panic(msg);
} else
pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);

so that we not run kexec in #MC context.

Hmmm.

I don't have ready test case for real MCE to verify whether it has exited #MC context before panic() or not.

In mce-inject case that based on NMI, it doesn't work as mentioned indeed.

B.R.,

Zeng Heng