[PATCH 8/9] KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT access

From: Sean Christopherson
Date: Wed Jan 19 2022 - 20:08:06 EST


Resume the guest instead of synthesizing a triple fault shutdown if the
instruction bytes buffer is empty due to the #NPF being on the code fetch
itself or on a page table access. The SMAP errata applies if and only if
the code fetch was successful and ucode's subsequent data read from the
code page encountered a SMAP violation. In practice, the guest is likely
hosed either way, but crashing the guest on a code fetch to emulated MMIO
is technically wrong according to the behavior described in the APM.

Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/svm/svm.c | 43 +++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d324183fc596..a4b02a6217fd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4262,6 +4262,7 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
{
bool smep, smap, is_user;
unsigned long cr4;
+ u64 error_code;

/* Emulation is always possible when KVM has access to all guest state. */
if (!sev_guest(vcpu->kvm))
@@ -4325,22 +4326,31 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
* loap uop with CPL=0 privileges. If the load hits a SMAP #PF, ucode
* gives up and does not fill the instruction bytes buffer.
*
- * Detection:
- * KVM reaches this point if the VM is an SEV guest, the CPU supports
- * DecodeAssist, a #NPF was raised, KVM's page fault handler triggered
- * emulation (e.g. for MMIO), and the CPU returned 0 in GuestIntrBytes
- * field of the VMCB.
+ * As above, KVM reaches this point iff the VM is an SEV guest, the CPU
+ * supports DecodeAssist, a #NPF was raised, KVM's page fault handler
+ * triggered emulation (e.g. for MMIO), and the CPU returned 0 in the
+ * GuestIntrBytes field of the VMCB.
*
* This does _not_ mean that the erratum has been encountered, as the
* DecodeAssist will also fail if the load for CS:RIP hits a legitimate
* #PF, e.g. if the guest attempt to execute from emulated MMIO and
* encountered a reserved/not-present #PF.
*
- * To reduce the likelihood of false positives, take action if and only
- * if CR4.SMAP=1 (obviously required to hit the erratum) and CR4.SMEP=0
- * or CPL=3. If SMEP=1 and CPL!=3, the erratum cannot have been hit as
- * the guest would have encountered a SMEP violation #PF, not a #NPF.
+ * To hit the erratum, the following conditions must be true:
+ * 1. CR4.SMAP=1 (obviously).
+ * 2. CR4.SMEP=0 || CPL=3. If SMEP=1 and CPL<3, the erratum cannot
+ * have been hit as the guest would have encountered a SMEP
+ * violation #PF, not a #NPF.
+ * 3. The #NPF is not due to a code fetch, in which case failure to
+ * retrieve the instruction bytes is legitimate (see abvoe).
+ *
+ * In addition, don't apply the erratum workaround if the #NPF occurred
+ * while translating guest page tables (see below).
*/
+ error_code = to_svm(vcpu)->vmcb->control.exit_info_1;
+ if (error_code & (PFERR_GUEST_PAGE_MASK | PFERR_FETCH_MASK))
+ goto resume_guest;
+
cr4 = kvm_read_cr4(vcpu);
smep = cr4 & X86_CR4_SMEP;
smap = cr4 & X86_CR4_SMAP;
@@ -4350,6 +4360,21 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
}

+resume_guest:
+ /*
+ * If the erratum was not hit, simply resume the guest and let it fault
+ * again. While awful, e.g. the vCPU may get stuck in an infinite loop
+ * if the fault is at CPL=0, it's the lesser of all evils. Exiting to
+ * userspace will kill the guest, and letting the emulator read garbage
+ * will yield random behavior and potentially corrupt the guest.
+ *
+ * Simply resuming the guest is technically not a violation of the SEV
+ * architecture. AMD's APM states that all code fetches and page table
+ * accesses for SEV guest are encrypted, regardless of the C-Bit. The
+ * APM also states that encrypted accesses to MMIO are "ignored", but
+ * doesn't explicitly define "ignored", i.e. doing nothing and letting
+ * the guest spin is technically "ignoring" the access.
+ */
return false;
}

--
2.34.1.703.g22d0c6ccf7-goog