Re: [PATCH] x86: kvm: Avoid guest page table walk when gpa_available is set

From: Radim KrÄmÃÅ
Date: Wed Apr 26 2017 - 16:44:42 EST


2017-04-25 17:02-0500, Brijesh Singh:
> > > I also wanted to avoid adding yet another variable but we can't depend on
> > > cr2 parameters passed into x86_emulate_instruction().
> > >
> > > The x86_emulate_instruction() function is called from two places:
> > >
> > > 1) handling the page-fault.
> > > pf_interception [svm.c]
> > > kvm_mmu_page_fault [mmu.c]
> > > x86_emulate_instruction [x86.c]
> > >
> > > 2) completing the IO/MMIO's from previous instruction decode
> > > kvm_arch_vcpu_ioctl_run
> > > complete_emulated_io
> > > emulate_instruction
> > > x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0)
> > >
> > > In #1, we are guaranteed that cr2 variable will contain a valid GPA but
> > > in #2, CR2 is set to zero.
> >
> > We are setting up the completion in #1 x86_emulate_instruction(), where
> > the gpa (cr2) is available, so we could store the value while arming
> > vcpu->arch.complete_userspace_io.
> >
> > emulator_read_write_onepage() already saves gpa in frag->gpa, which is
> > then passed into complete_emulated_mmio -- isn't that mechanism
> > sufficient?
> >
>
> I see that complete_emulated_mmio() saves the frag>gpa into run->mmio.phys_addr,
> so based on the exit_reason we should be able to get the saved gpa.

I mean, we could pass the frag->gpa into x86_emulate_instruction().
I think that exit_reason is related just accidentally and relying on it
inside x86_emulate_instruction() is bad.

> In my debug patch below, I tried doing something similar to verify that frag->gpa
> contains the valid CR2 value but I saw a bunch of mismatch. So it seems like we
> may not able to use frag->gpa mechanism. Additionally we also need to handle the
> PIO cases (e.g what if we are called from complete_emulated_pio), which also takes
> similar code path

Yes, but PIO should trigger a NPF exit relatively rarely and
generalizing the method from MMIO will be easy.

> complete_emulated_pio
> completed_emulated_io
> emulate_instruction
> x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0)
>
> @@ -5682,13 +5686,20 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> restart:
> /*
> - * Save the faulting GPA (cr2) in the address field
> - * NOTE: If gpa_available is set then gpa_val will contain a valid GPA
> + * if previous exit was due to userspace mmio completion then actual
> + * cr2 is stored in mmio.phys_addr.
> */
> - if (vcpu->arch.gpa_available)
> - ctxt->exception.address = vcpu->arch.gpa_val;
> - else
> - ctxt->exception.address = cr2;
> + if (vcpu->run->exit_reason == KVM_EXIT_MMIO) {
> + cr2 = vcpu->run->mmio.phys_addr;
> + if (cr2 != vcpu->arch.gpa_val)
> + pr_err("** mismatch %llx %llx\n",
> + vcpu->run->mmio.phys_addr, vcpu->arch.gpa_val);

This will probably return false negatives when then vcpu->arch.gpa_val
couldn't be used anyway (possibly after a VM exits), so it is hard to
draw a conclusion.

I would really like if we had a solution that passed the gpa inside
function parameters.

(Btw. cr2 can also be a virtual address, so we can call it gpa directly)

> > If at all possible, I'd like to have the gpa passed with other relevant
> > data, instead of having it isolated like this ... and we can't manage
> > that, then at least good benchmark results to excuse the bad code.
> >
>
> I ran two tests to see how many times we walk guest page table.
>
> Test1: run kvm-unit-test
> Test2: launch Ubuntu 16.06 guest, run a stressapptest for 20 seconds, shutdown the VM
>
> Before patch
> * Test1: 10419
> * Test2: 243365
>
> After patch:
> * Test1: 1259
> * Test2: 1221
>
> Please let me know if you want me to run other other benchmark and capture the data.

Looks convincing, thanks.