Re: [PATCH] x86: kvm: Avoid guest page table walk when gpa_available is set

From: Brijesh Singh
Date: Tue Apr 25 2017 - 18:03:54 EST




I also wanted to avoid adding yet another variable but we can't depend on
cr2 parameters passed into x86_emulate_instruction().

The x86_emulate_instruction() function is called from two places:

1) handling the page-fault.
pf_interception [svm.c]
kvm_mmu_page_fault [mmu.c]
x86_emulate_instruction [x86.c]

2) completing the IO/MMIO's from previous instruction decode
kvm_arch_vcpu_ioctl_run
complete_emulated_io
emulate_instruction
x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0)

In #1, we are guaranteed that cr2 variable will contain a valid GPA but
in #2, CR2 is set to zero.

We are setting up the completion in #1 x86_emulate_instruction(), where
the gpa (cr2) is available, so we could store the value while arming
vcpu->arch.complete_userspace_io.

emulator_read_write_onepage() already saves gpa in frag->gpa, which is
then passed into complete_emulated_mmio -- isn't that mechanism
sufficient?


I see that complete_emulated_mmio() saves the frag>gpa into run->mmio.phys_addr,
so based on the exit_reason we should be able to get the saved gpa.
In my debug patch below, I tried doing something similar to verify that frag->gpa
contains the valid CR2 value but I saw a bunch of mismatch. So it seems like we
may not able to use frag->gpa mechanism. Additionally we also need to handle the
PIO cases (e.g what if we are called from complete_emulated_pio), which also takes
similar code path

complete_emulated_pio
completed_emulated_io
emulate_instruction
x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0)

@@ -5682,13 +5686,20 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
restart:
/*
- * Save the faulting GPA (cr2) in the address field
- * NOTE: If gpa_available is set then gpa_val will contain a valid GPA
+ * if previous exit was due to userspace mmio completion then actual
+ * cr2 is stored in mmio.phys_addr.
*/
- if (vcpu->arch.gpa_available)
- ctxt->exception.address = vcpu->arch.gpa_val;
- else
- ctxt->exception.address = cr2;
+ if (vcpu->run->exit_reason == KVM_EXIT_MMIO) {
+ cr2 = vcpu->run->mmio.phys_addr;
+ if (cr2 != vcpu->arch.gpa_val)
+ pr_err("** mismatch %llx %llx\n",
+ vcpu->run->mmio.phys_addr, vcpu->arch.gpa_val);
+ }
+
+ /* Save the faulting GPA (cr2) in the address field */
+ ctxt->exception.address = cr2;



handle_exit [svm.c]
pf_interception [svm.c]
/* it invokes the fault handler with CR2 = svm->vmcb->control.exit_info_2 */
kvm_mmu_page_fault [mmu.c]
x86_emulate_instruction [x86.c]
emulator_read_write_onepage [x86.c]
/*
*this is where we walk the guest page table to translate
* a GVA to GPA. If gpa_available is set then we use the
* gpa_val instead of walking the pgtable.
*/

pf_interception is the NPF exit handler -- please move the setting
there, at least. handle_exit() is a hot path that shouldn't contain
code that isn't applicable to all exits.


Sure, Will do.

Btw. we need some other guarantees to declare it as GPA (cr2 is GPA in
NPT exits, but might not be in other) ... isn't arch.mmu.direct_map a
condition we are interested in?

The other code uses it to interpret cr2 directly as gpa, so we might be
able to avoid setting the arch.gpa_available in a hot path too.


Hmm looking at the call trace I am not sure how arch.mmu_direct_map will help
but I will investigate a bit more.


See my previous comment. In some cases CR2 may be set to zero
(e.g when completing the instruction from previous io/mmio page-fault).

If we are decide to add the gpa_val then we can remove above if
statement from x86_emulate_instruction() and update emulator_read_write_onepage
to use the vcpu->arch.gpa_val instead of exception->address.

Yeah, that would be nicer than setting exception->address at a random
place.

We could also pass the value as cr2 in all cases if it made something
better.

if (vcpu->arch.gpa_available &&
emulator_can_use_gpa(ctxt) &&
(addr & ~PAGE_MASK) == (exception->address & ~PAGE_MASK)) {
gpa = vcpu=>arch.gpa_val;
...
...
}

If at all possible, I'd like to have the gpa passed with other relevant
data, instead of having it isolated like this ... and we can't manage
that, then at least good benchmark results to excuse the bad code.


I ran two tests to see how many times we walk guest page table.

Test1: run kvm-unit-test
Test2: launch Ubuntu 16.06 guest, run a stressapptest for 20 seconds, shutdown the VM

Before patch
* Test1: 10419
* Test2: 243365

After patch:
* Test1: 1259
* Test2: 1221

Please let me know if you want me to run other other benchmark and capture the data.

-Brijesh