Re: [PATCH v4 18/30] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()

From: Sean Christopherson
Date: Fri Mar 04 2022 - 11:11:13 EST


On Fri, Mar 04, 2022, Mingwei Zhang wrote:
> On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index f3939ce4a115..c71debdbc732 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -834,10 +834,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > }
> >
> > /*
> > - * Tears down the mappings for the range of gfns, [start, end), and frees the
> > - * non-root pages mapping GFNs strictly within that range. Returns true if
> > - * SPTEs have been cleared and a TLB flush is needed before releasing the
> > - * MMU lock.
> > + * Zap leafs SPTEs for the range of gfns, [start, end). Returns true if SPTEs
> > + * have been cleared and a TLB flush is needed before releasing the MMU lock.
>
> I think the original code does not _over_ zapping. But the new version
> does.

No, the new version doesn't overzap.

> Will that have some side effects? In particular, if the range is
> within a huge page (or HugeTLB page of various sizes), then we choose to
> zap it even if it is more than the range.

The old version did that too. KVM _must_ zap a hugepage that overlaps the range,
otherwise the guest would be able to access memory that has been freed/moved. If
the operation has unmapped a subset of a hugepage, KVM needs to zap and rebuild
the portions that are still valid using smaller pages.

> Regardless of side effect, I think we probably should mention that in
> the comments?
> > - /*
> > - * If this is a non-last-level SPTE that covers a larger range
> > - * than should be zapped, continue, and zap the mappings at a
> > - * lower level, except when zapping all SPTEs.
> > - */
> > - if (!zap_all &&
> > - (iter.gfn < start ||
> > - iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
> > + if (!is_shadow_present_pte(iter.old_spte) ||
> > !is_last_spte(iter.old_spte, iter.level))

It's hard to see in the diff, but the key is the "!is_last_spte()" check. The
check before was skipping non-leaf, a.k.a. shadow pages, if they weren't in the
range. The new version _always_ skips shadow pages. Hugepages will always
return true for is_last_spte() and will never be skipped.