Re: [PATCH 2/6] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs

From: Sean Christopherson
Date: Mon Apr 11 2022 - 14:33:24 EST


On Mon, Apr 11, 2022, Mingwei Zhang wrote:
> On Sat, Apr 09, 2022, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 671cfeccf04e..89df062d5921 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -191,6 +191,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > .user = err & PFERR_USER_MASK,
> > .prefetch = prefetch,
> > .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> > +
> > + /*
> > + * Note, enforcing the NX huge page mitigation for nonpaging
> > + * MMUs (shadow paging, CR0.PG=0 in the guest) is completely
> > + * unnecessary. The guest doesn't have any page tables to
> > + * abuse and is guaranteed to switch to a different MMU when
> > + * CR0.PG is toggled on (may not always be guaranteed when KVM
> > + * is using TDP). See make_spte() for details.
> > + */
> > .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
>
> hmm. I think there could be a minor issue here (even in original code).
> The nx_huge_page_workaround_enabled is attached here with page fault.
> However, at the time of make_spte(), we call is_nx_huge_page_enabled()
> again. Since this function will directly check the module parameter,
> there might be a race condition here. eg., at the time of page fault,
> the workround was 'true', while by the time we reach make_spte(), the
> parameter was set to 'false'.

Toggling the mitigation invalidates and zaps all roots. Any page fault acquires
mmu_lock after the toggling is guaranteed to see the correct value, any page fault
that completed before kvm_mmu_zap_all_fast() is guaranteed to be zapped.

> I have not figured out what the side effect is. But I feel like the
> make_spte() should just follow the information in kvm_page_fault instead
> of directly querying the global config.

I started down this exact path :-) The problem is that, even without Ben's series,
KVM uses make_spte() for things other than page faults.