Re: [PATCH] kvm: x86: Improve virtual machine startup performance

From: Hao Peng
Date: Wed Mar 02 2022 - 21:58:36 EST


On Thu, Mar 3, 2022 at 9:29 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Mar 02, 2022, Hao Peng wrote:
> > On Wed, Mar 2, 2022 at 1:54 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 01, 2022, Peng Hao wrote:
> > > > From: Peng Hao <flyingpeng@xxxxxxxxxxx>
> > > >
> > > > vcpu 0 will repeatedly enter/exit the smm state during the startup
> > > > phase, and kvm_init_mmu will be called repeatedly during this process.
> > > > There are parts of the mmu initialization code that do not need to be
> > > > modified after the first initialization.
> > > >
> > > > Statistics on my server, vcpu0 when starting the virtual machine
> > > > Calling kvm_init_mmu more than 600 times (due to smm state switching).
> > > > The patch can save about 36 microseconds in total.
> > > >
> > > > Signed-off-by: Peng Hao <flyingpeng@xxxxxxxxxxx>
> > > > ---
> > > > @@ -5054,7 +5059,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
> > > > {
> > > > kvm_mmu_unload(vcpu);
> > > > - kvm_init_mmu(vcpu);
> > > > + kvm_init_mmu(vcpu, false);
> > >
> > > This is wrong, kvm_mmu_reset_context() is the "big hammer" and is expected to
> > > unconditionally get the MMU to a known good state. E.g. failure to initialize
> > > means this code:
> > >
> > > context->shadow_root_level = kvm_mmu_get_tdp_level(vcpu);
> > >
> > > will not update the shadow_root_level as expected in response to userspace changing
> > > guest.MAXPHYADDR in such a way that KVM enables/disables 5-level paging.
> > >
> > Thanks for pointing this out. However, other than shadow_root_level,
> > other fields of context will not
> > change during the entire operation, such as
> > page_fault/sync_page/direct_map and so on under
> > the condition of tdp_mmu.
> > Is this patch still viable after careful confirmation of the fields
> > that won't be modified?
>
> No, passing around the "init" flag is a hack.
>
> But, we can achieve what you want simply by initializing the constant data once
> per vCPU. There's a _lot_ of state that is constant for a given MMU now that KVM
> uses separate MMUs for L1 vs. L2 when TDP is enabled. I should get patches posted
> tomorrow, just need to test (famous last words).
>
> Also, based on the number of SMM transitions, I'm guessing you're using SeaBIOS.
> Have you tried disabling CONFIG_CALL32_SMM, or CONFIG_USE_SMM altogether? That
> might be an even better way to improve performance in your environment.
>

Both options are disabled in guest.
> Last question, do you happen to know why eliminating this code shaves 36us? The
> raw writes don't seem like they'd take that long. Maybe the writes to function
> pointers trigger stalls or mispredicts or something? If you don't have an easy
> answer, don't bother investigating, I'm just curious.

I'm guessing it's because of the cache. At first, I wanted to replace
it with memcpy, if the modified fields are continuous enough, I can
use instructions such as erms/fsrm.