Re: [PATCH v4 1/5] x86/kexec: do unconditional WBINVD for bare-metal in stop_this_cpu()

From: Tom Lendacky
Date: Thu Apr 18 2024 - 09:47:55 EST


On 4/18/24 06:48, Kai Huang wrote:


..

Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
Suggested-by: Borislav Petkov <bp@xxxxxxxxx>
Cc: Tom Lendacky <thomas.lendacky@xxxxxxx>
Cc: Dave Young <dyoung@xxxxxxxxxx>

Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx>

---

v3 -> v4:
- Update part of changelog based on Kirill's version (with minor tweak).
- Use "exception (#VE or #VC)" for TDX and SEV-ES/SEV-SNP in changelog
and comments. (Kirill, Tom)
- Point out "WBINVD is not necessary for TDX and SEV-ES/SEV-SNP guests"
in the comment. (Tom)

v2 -> v3:
- Change to only do WBINVD for bare metal

---
arch/x86/kernel/process.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..d3c904bfe874 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -813,18 +813,17 @@ void __noreturn stop_this_cpu(void *dummy)
mcheck_cpu_clear(c);
/*
- * Use wbinvd on processors that support SME. This provides support
- * for performing a successful kexec when going from SME inactive
- * to SME active (or vice-versa). The cache must be cleared so that
- * if there are entries with the same physical address, both with and
- * without the encryption bit, they don't race each other when flushed
- * and potentially end up with the wrong entry being committed to
- * memory.
+ * The kernel could leave caches in incoherent state on SME/TDX
+ * capable platforms. Flush cache to avoid silent memory
+ * corruption for these platforms.
*
- * Test the CPUID bit directly because the machine might've cleared
- * X86_FEATURE_SME due to cmdline options.
+ * stop_this_cpu() isn't a fast path, just do WBINVD for bare-metal
+ * to cover both SME and TDX. It isn't necessary to perform WBINVD
+ * in a guest and performing one could result in an exception (#VE
+ * or #VC) for a TDX or SEV-ES/SEV-SNP guest that the guest may
+ * not be able to handle (e.g., TDX guest panics if it sees #VE).
*/
- if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
+ if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
native_wbinvd();
/*