Re: [PATCH v2 4/7] KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped

From: Paolo Bonzini
Date: Tue Mar 01 2022 - 12:55:25 EST


On 2/25/22 19:22, Sean Christopherson wrote:
@@ -5656,7 +5707,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
* Note: we need to do this under the protection of mmu_lock,
* otherwise, vcpu would purge shadow page but miss tlb flush.
*/
- kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+ kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);

I was going to squash in this:

* invalidating TDP MMU roots must be done while holding mmu_lock for
- * write and in the same critical section as making the reload request,
+ * write and in the same critical section as making the free request,
* e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield.

But then I realized that this needs better comments and that my knowledge of
this has serious holes. Regarding this comment, this is my proposal:

/*
* Invalidated TDP MMU roots are zapped within MMU read_lock to be
* able to walk the list of roots, but with the expectation of no
* concurrent change to the pages themselves. There cannot be
* any yield between kvm_tdp_mmu_invalidate_all_roots and the free
* request, otherwise somebody could grab a reference to the root
* and break that assumption.
*/
if (is_tdp_mmu_enabled(kvm))
kvm_tdp_mmu_invalidate_all_roots(kvm);

However, for the second comment (the one in the context above), there's much
more. From easier to harder:

1) I'm basically clueless about the TLB flush "note" above.

2) It's not clear to me what needs to use for_each_tdp_mmu_root; for
example, why would anything but the MMU notifiers use for_each_tdp_mmu_root?
It is used in kvm_tdp_mmu_write_protect_gfn, kvm_tdp_mmu_try_split_huge_pages
and kvm_tdp_mmu_clear_dirty_pt_masked.

3) Does it make sense that yielding users of for_each_tdp_mmu_root must
either look at valid roots only, or take MMU lock for write? If so, can
this be enforced in tdp_mmu_next_root?

4) If the previous point is correct, _who_ could grab a reference and
not release it before kvm_tdp_mmu_zap_invalidated_roots runs? That is,
is "somebody could grab a reference" an accurate explanation in the first
comment above?

Thanks,

Paolo