Re: [PATCH v2 4/7] KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped

From: Paolo Bonzini
Date: Wed Mar 02 2022 - 13:04:55 EST


On 3/1/22 18:55, Paolo Bonzini wrote:
On 2/25/22 19:22, Sean Christopherson wrote:
@@ -5656,7 +5707,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
       * Note: we need to do this under the protection of mmu_lock,
       * otherwise, vcpu would purge shadow page but miss tlb flush.
       */
-    kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+    kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);

I was going to squash in this:

      * invalidating TDP MMU roots must be done while holding mmu_lock for
-     * write and in the same critical section as making the reload request,
+     * write and in the same critical section as making the free request,
      * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield.

But then I realized that this needs better comments and that my knowledge of
this has serious holes.  Regarding this comment, this is my proposal:

        /*
         * Invalidated TDP MMU roots are zapped within MMU read_lock to be
         * able to walk the list of roots, but with the expectation of no
         * concurrent change to the pages themselves.  There cannot be
         * any yield between kvm_tdp_mmu_invalidate_all_roots and the free
         * request, otherwise somebody could grab a reference to the root
     * and break that assumption.
         */
        if (is_tdp_mmu_enabled(kvm))
                kvm_tdp_mmu_invalidate_all_roots(kvm);

However, for the second comment (the one in the context above), there's much
more.  From easier to harder:

1) I'm basically clueless about the TLB flush "note" above.

2) It's not clear to me what needs to use for_each_tdp_mmu_root; for
example, why would anything but the MMU notifiers use for_each_tdp_mmu_root?
It is used in kvm_tdp_mmu_write_protect_gfn, kvm_tdp_mmu_try_split_huge_pages
and kvm_tdp_mmu_clear_dirty_pt_masked.

3) Does it make sense that yielding users of for_each_tdp_mmu_root must
either look at valid roots only, or take MMU lock for write?  If so, can
this be enforced in tdp_mmu_next_root?

Ok, I could understand this a little better now, but please correct me
if this is incorrect:

2) if I'm not wrong, kvm_tdp_mmu_try_split_huge_pages indeed does not
need to walk invalid roots. The others do because the TDP MMU does
not necessarily kick vCPUs after marking roots as invalid. But
because TDP MMU roots are gone for good once their refcount hits 0,
I wonder if we could do something like

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7e3d1f985811..a4a6dfee27f9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -164,6 +164,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
*/
if (!kvm_tdp_root_mark_invalid(root)) {
refcount_set(&root->tdp_mmu_root_count, 1);
+ kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
/*
* If the struct kvm is alive, we might as well zap the root
@@ -1099,12 +1100,16 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
{
struct kvm_mmu_page *root;
+ bool invalidated_root = false
lockdep_assert_held_write(&kvm->mmu_lock);
list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
if (!WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)))
- root->role.invalid = true;
+ invalidated_root |= !kvm_tdp_root_mark_invalid(root);
}
+
+ if (invalidated_root)
+ kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
}
/*

(based on my own version of Sean's patches) and stop walking invalid roots
in kvm_tdp_mmu_write_protect_gfn and kvm_tdp_mmu_clear_dirty_pt_masked.


3) Yes, it makes sense that yielding users of for_each_tdp_mmu_root must
either look at valid roots only, or take MMU lock for write. The only
exception is kvm_tdp_mmu_try_split_huge_pages, which does not need to
walk invalid roots. And kvm_tdp_mmu_zap_invalidated_pages(), but that
one is basically an asynchronous worker [and this is where I had the
inspiration to get rid of the function altogether]

Paolo