Re: [PATCH 3/3] KVM: x86: do not unload MMU roots when only toggling CR0.WP

From: Sean Christopherson
Date: Tue Jan 17 2023 - 18:37:21 EST


On Tue, Jan 17, 2023, Mathias Krause wrote:
> There is no need to unload the MMU roots when only CR0.WP has changed --
> the paging structures are still valid, only the permission bitmap needs
> to be updated.

This doesn't hold true when KVM is using shadow paging, in which case CR0.WP
affects the shadow page tables. I believe that also holds true for nNPT :-(

nEPT doesn't consume CR0.WP so we could expedite that case as well, though
identifying that case might be annoying.

> Change kvm_mmu_reset_context() to get passed the need for unloading MMU
> roots and explicitly avoid it if only CR0.WP was toggled on a CR0 write
> caused VMEXIT.

One thing we should explore on top of this is not intercepting CR0.WP (on Intel)
when TDP is enabled. It could even trigger after toggling CR0.WP N times, e.g.
to optimize the grsecurity use case without negatively impacting workloads with
a static CR0.WP, as walking guest memory would require an "extra" VMREAD to get
CR0.WP in that case.

Unfortunately, AMD doesn't provide per-bit controls.

> This change brings a huge performance gain as the following micro-
> benchmark running 'ssdd 10 50000' from rt-tests[1] on a grsecurity L1 VM
> shows (runtime in seconds, lower is better):
>
> legacy MMU TDP MMU
> kvm.git/queue 11.55s 13.91s
> kvm.git/queue+patch 7.44s 7.94s
>
> For legacy MMU this is ~35% faster, for TTP MMU ~43% faster.
>
> [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
>
> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>
> ---

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 508074e47bc0..d7c326ab94de 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -902,7 +902,9 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
>
> void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
> {
> - if ((cr0 ^ old_cr0) & X86_CR0_PG) {
> + unsigned long cr0_change = cr0 ^ old_cr0;
> +
> + if (cr0_change & X86_CR0_PG) {
> kvm_clear_async_pf_completion_queue(vcpu);
> kvm_async_pf_hash_reset(vcpu);
>
> @@ -914,10 +916,18 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
> kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> }
>
> - if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
> - kvm_mmu_reset_context(vcpu);
> + if (cr0_change & KVM_MMU_CR0_ROLE_BITS) {
> + bool unload_mmu =
> + cr0_change & (KVM_MMU_CR0_ROLE_BITS & ~X86_CR0_WP);

As above, this needs to guarded with a check that the MMU is direct. And rather
than add a flag to kvm_mmu_reset_context(), just call kvm_init_mmu() directly.
E.g. I think this would work?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d07563d0e204..8f9fac6d81d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -927,6 +927,11 @@ EXPORT_SYMBOL_GPL(load_pdptrs);

void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
{
+ if (vcpu->arch.mmu->root_role.direct && (cr0 ^ old_cr0) == X86_CR0_WP) {
+ kvm_init_mmu(vcpu);
+ return;
+ }
+
if ((cr0 ^ old_cr0) & X86_CR0_PG) {
kvm_clear_async_pf_completion_queue(vcpu);
kvm_async_pf_hash_reset(vcpu);