Re: [PATCH v19 103/130] KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI

From: Binbin Wu
Date: Tue May 07 2024 - 03:07:19 EST




On 4/4/2024 6:23 AM, Isaku Yamahata wrote:
On Mon, Apr 01, 2024 at 05:14:02PM +0800,
Chao Gao <chao.gao@xxxxxxxxx> wrote:

On Mon, Feb 26, 2024 at 12:26:45AM -0800, isaku.yamahata@xxxxxxxxx wrote:
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

When BIOS eMCA MCE-SMI morphing is enabled, the #MC is morphed to MSMI
(Machine Check System Management Interrupt). Then the SMI causes TD exit
with the read reason of EXIT_REASON_OTHER_SMI with MSMI bit set in the exit
qualification to KVM instead of EXIT_REASON_EXCEPTION_NMI with MC
exception.

Handle EXIT_REASON_OTHER_SMI with MSMI bit set in the exit qualification as
MCE(Machine Check Exception) happened during TD guest running.

Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
---
arch/x86/kvm/vmx/tdx.c | 40 ++++++++++++++++++++++++++++++++++---
arch/x86/kvm/vmx/tdx_arch.h | 2 ++
2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index bdd74682b474..117c2315f087 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -916,6 +916,30 @@ void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
tdexit_intr_info(vcpu));
else if (exit_reason == EXIT_REASON_EXCEPTION_NMI)
vmx_handle_exception_irqoff(vcpu, tdexit_intr_info(vcpu));
+ else if (unlikely(tdx->exit_reason.non_recoverable ||
+ tdx->exit_reason.error)) {
why not just:
else if (tdx->exit_reason.basic == EXIT_REASON_OTHER_SMI) {


i.e., does EXIT_REASON_OTHER_SMI imply exit_reason.non_recoverable or
exit_reason.error?
Yes, this should be refined.


+ /*
+ * The only reason it gets EXIT_REASON_OTHER_SMI is there is an
+ * #MSMI(Machine Check System Management Interrupt) with
+ * exit_qualification bit 0 set in TD guest.
+ * The #MSMI is delivered right after SEAMCALL returns,
+ * and an #MC is delivered to host kernel after SMI handler
+ * returns.
+ *
+ * The #MC right after SEAMCALL is fixed up and skipped in #MC
Looks fixing up and skipping #MC on the first instruction after TD-exit is
missing in v19?
Right. We removed it as MSMI will provides if #MC happened in SEAM or not.

According to the patch of host #MC handler
https://lore.kernel.org/lkml/171265126376.10875.16864387954272613660.tip-bot2@tip-bot2/,
the #MC triggered by MSMI can be handled by kernel #MC handler.
There is no need to call kvm_machine_check().

Does the following fixup make sense to you?
--------

KVM: TDX: Handle EXIT_REASON_OTHER_SMI

Handle "Other SMI" VM exit for TDX.

Unlike VMX, an SMI occurs in SEAM non-root mode cause VM exit to TDX
module, then SEAMRET to KVM. Once it exits to KVM, SMI is delivered and
handled by kernel handler right away.

Specifically, when BIOS eMCA MCE-SMI morphing is enabled, the #MC occurs
in TDX guest is delivered as an Machine Check System Management Interrupt
(MSMI) with the exit reason of EXIT_REASON_OTHER_SMI with MSMI (bit 0) set
in the exit qualification.  On VM exit, TDX module checks whether the "Other
SMI" is caused by a MSMI or not.  If so, TDX module makes TD as fatal,
preventing further TD entries, and then completes the TD exit flow to KVM
with the TDH.VP.ENTER outputs indicating TDX_NON_RECOVERABLE_TD.
After TD exit, the MSMI is delivered and eventually handled by the kernel
#MC handler[1].

So, to handle "Other SMI" VM exit:
- For non-MSMI case, KVM doesn't need to do anything, just continue TDX vCPU
  execution.
- For MSMI case, since the TDX guest is dead, follow other non-recoverable
  cases, exit to userspace.

[1] The patch supports handling MSMI signaled during SEAM operation.
    It's already in tip tree.
https://lore.kernel.org/lkml/171265126376.10875.16864387954272613660.tip-bot2@tip-bot2/

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4ee94bfb17e2..fd756d231204 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -975,30 +975,6 @@ void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
tdexit_intr_info(vcpu));
        else if (exit_reason == EXIT_REASON_EXCEPTION_NMI)
                vmx_handle_exception_irqoff(vcpu, tdexit_intr_info(vcpu));
-       else if (unlikely(tdx->exit_reason.non_recoverable ||
-                tdx->exit_reason.error)) {
-               /*
-                * The only reason it gets EXIT_REASON_OTHER_SMI is there is an
-                * #MSMI(Machine Check System Management Interrupt) with
-                * exit_qualification bit 0 set in TD guest.
-                * The #MSMI is delivered right after SEAMCALL returns,
-                * and an #MC is delivered to host kernel after SMI handler
-                * returns.
-                *
-                * The #MC right after SEAMCALL is fixed up and skipped in #MC
-                * handler because it's an #MC happens in TD guest we cannot
-                * handle it with host's context.
-                *
-                * Call KVM's machine check handler explicitly here.
-                */
-               if (tdx->exit_reason.basic == EXIT_REASON_OTHER_SMI) {
-                       unsigned long exit_qual;
-
-                       exit_qual = tdexit_exit_qual(vcpu);
-                       if (exit_qual & TD_EXIT_OTHER_SMI_IS_MSMI)
-                               kvm_machine_check();
-               }
-       }
 }

 static int tdx_handle_exception(struct kvm_vcpu *vcpu)
@@ -1923,10 +1899,6 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
                              to_kvm_tdx(vcpu->kvm)->hkid,
                              set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));

-               /*
-                * tdx_handle_exit_irqoff() handled EXIT_REASON_OTHER_SMI.  It
-                * must be handled before enabling preemption because it's #MC.
-                */
                goto unhandled_exit;
        }

@@ -1970,14 +1942,14 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
                return tdx_handle_ept_misconfig(vcpu);
        case EXIT_REASON_OTHER_SMI:
                /*
-                * Unlike VMX, all the SMI in SEAM non-root mode (i.e. when
-                * TD guest vcpu is running) will cause TD exit to TDX module,
-                * then SEAMRET to KVM. Once it exits to KVM, SMI is delivered
-                * and handled right away.
+                * Unlike VMX, SMI occurs in SEAM non-root mode (i.e. when
+                * TD guest vCPU is running) will cause VM exit to TDX module,
+                * then SEAMRET to KVM.  Once it exits to KVM, SMI is delivered
+                * and handled by kernel handler right away.
                 *
-                * - If it's an Machine Check System Management Interrupt
-                *   (MSMI), it's handled above due to non_recoverable bit set.
-                * - If it's not an MSMI, don't need to do anything here.
+                * - A MSMI will not reach here, it's handled as non_recoverable
+                *   case above.
+                * - If it's not a MSMI, no need to do anything here.
                 */
                return 1;
        default:
diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
index 2aecffe9f276..aa2fea7b2652 100644
--- a/arch/x86/kvm/vmx/tdx_arch.h
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -41,8 +41,6 @@
 #define TDH_PHYMEM_PAGE_WBINVD         41
 #define TDH_VP_WR                      43

-#define TD_EXIT_OTHER_SMI_IS_MSMI      BIT_ULL(1)
-
 /* TDX control structure (TDR/TDCS/TDVPS) field access codes */
 #define TDX_NON_ARCH                   BIT_ULL(63)
 #define TDX_CLASS_SHIFT                        56