[RFC PATCH v2 7/7] x86/fault: Handle RMP faults with 0 address when nested

From: Jeremi Piotrowski
Date: Mon Feb 13 2023 - 05:35:04 EST


When using SNP, accessing an encrypted guest page from the host triggers
an RMP fault. The page fault handling code can currently handle this by
looking up the corresponding rmp entry. If the same operation happens
when using nested virtualization, the L0 hypervisor sees a #NPF but the
CPU does not provide the address of the fault if the CPU was running at
L1 at the time of the fault.

This happens on Hyper-V when using nested SNP guests. Hyper-V has no
choice but to use a placeholder address (0) when injecting the page
fault to L1. We need to handle this, and the only sane thing to do is to
forward a SIGBUS to the task.

One path where this happens is when the SNP guest issues a
KVM_HC_CLOCK_PAIRING hypercall, which leads to KVM calling
kvm_write_guest() on a guest supplied address. This results in the
following backtrace:

[ 191.862660] exc_page_fault+0x71/0x170
[ 191.862664] asm_exc_page_fault+0x2c/0x40
[ 191.862666] RIP: 0010:copy_user_enhanced_fast_string+0xa/0x40
...
[ 191.862677] ? __kvm_write_guest_page+0x6e/0xa0 [kvm]
[ 191.862700] kvm_write_guest_page+0x52/0xc0 [kvm]
[ 191.862788] kvm_write_guest+0x44/0x80 [kvm]
[ 191.862807] kvm_emulate_hypercall+0x1ca/0x5a0 [kvm]
[ 191.862830] ? kvm_emulate_monitor+0x40/0x40 [kvm]
[ 191.862849] svm_invoke_exit_handler+0x74/0x180 [kvm_amd]
[ 191.862854] sev_handle_vmgexit+0xf42/0x17f0 [kvm_amd]
[ 191.862858] ? __this_cpu_preempt_check+0x13/0x20
[ 191.862860] ? sev_post_map_gfn+0xf0/0xf0 [kvm_amd]
[ 191.862863] svm_invoke_exit_handler+0x74/0x180 [kvm_amd]
[ 191.862866] svm_handle_exit+0xb5/0x2b0 [kvm_amd]
[ 191.862869] kvm_arch_vcpu_ioctl_run+0x12a8/0x1aa0 [kvm]
[ 191.862891] kvm_vcpu_ioctl+0x24f/0x6d0 [kvm]
[ 191.862910] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
[ 191.862929] ? _copy_to_user+0x25/0x30
[ 191.862932] ? kvm_vm_ioctl+0x291/0xea0 [kvm]
[ 191.862951] ? kvm_vm_ioctl+0x291/0xea0 [kvm]
[ 191.862970] ? __fget_light+0xc5/0x100
[ 191.862972] __x64_sys_ioctl+0x91/0xc0
[ 191.862975] do_syscall_64+0x5c/0x80
[ 191.862976] ? exit_to_user_mode_prepare+0x53/0x240
[ 191.862978] ? syscall_exit_to_user_mode+0x17/0x40
[ 191.862980] ? do_syscall_64+0x69/0x80
[ 191.862981] ? do_syscall_64+0x69/0x80
[ 191.862982] ? syscall_exit_to_user_mode+0x17/0x40
[ 191.862983] ? do_syscall_64+0x69/0x80
[ 191.862984] ? syscall_exit_to_user_mode+0x17/0x40
[ 191.862985] ? do_syscall_64+0x69/0x80
[ 191.862986] ? do_syscall_64+0x69/0x80
[ 191.862987] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Without this fix the handler returns without doing anything and the
result is a soft-lockup of the CPU.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@xxxxxxxxxxxxxxxxxxx>
---
arch/x86/mm/fault.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2b16dcfbd9a..8706fd34f3a9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
#include <asm/sev.h> /* snp_lookup_rmpentry() */
+#include <asm/hypervisor.h> /* hypervisor_is_type() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -1282,6 +1283,18 @@ static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_
pte_t *pte;
u64 pfn;

+ /*
+ * When an rmp fault occurs while not inside the SNP guest, the L0
+ * hypervisor sees a NPF and does not have access to the address that
+ * caused the fault to forward to L1 hypervisor. Hyper-V places a 0 in
+ * the PF as a placeholder. SIGBUS the task since there's nothing
+ * better that we can do.
+ */
+ if (!address && hypervisor_is_type(X86_HYPER_MS_HYPERV)) {
+ do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+ return 1;
+ }
+
pgd = __va(read_cr3_pa());
pgd += pgd_index(address);

--
2.25.1