Re: [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root

From: Sean Christopherson
Date: Wed Mar 02 2022 - 12:43:49 EST


On Wed, Mar 02, 2022, Paolo Bonzini wrote:
> On 3/2/22 03:13, Sean Christopherson wrote:
> > That would work, though I'd prefer to recurse on kvm_tdp_mmu_put_root() instead
> > of open coding refcount_dec_and_test() so that we get coverage of the xchg()
> > doing the right thing.
> >
> > I still slightly prefer having the "free" path be inside the xchg(). To me, even
> > though the "free" path is the only one that's guaranteed to be reached for every root,
> > the fall-through to resetting the refcount and zapping the root is the "normal" path,
> > and the "free" path is the exception.
>
> Hmm I can see how that makes especially sense once you add in the worker logic.
> But it seems to me that the "basic" logic should be "do both the xchg and the
> free", and coding the function with tail recursion obfuscates this. Even with
> the worker, you grow an
>
> + if (kvm_get_kvm_safe(kvm)) {
> + ... let the worker do it ...
> + return;
> + }
> +
> tdp_mmu_zap_root(kvm, root, shared);
>
> but you still have a downwards flow that matches what happens even if multiple
> threads pick up different parts of the job.
>
> So, I tried a bunch of alternatives including with gotos and with if/else, but
> really the above one remains my favorite.

Works for me.

> My plan would be:
>
> 1) splice the mini series I'm attaching before this patch, and
> remove patch 1 of this series. next_invalidated_root() is a
> bit yucky, but notably it doesn't need to add/remove a reference
> in kvm_tdp_mmu_zap_invalidated_roots().
>
> Together, these two steps ensure that readers never acquire a
> reference to either refcount=0/valid or invalid pages". In other
> words, the three states of that kvm_tdp_mmu_put_root moves the root
> through (refcount=0/valid -> refcount=0/invalid -> refcount=1/invalid)
> are exactly the same to readers, and there are essentially no races
> to worry about.
>
> In other other words, it's trading slightly uglier code for simpler
> invariants.

I'm glad you thought of the "immediately send to a worker" idea, I don't know if
I could stomach next_invalidated_root() continuing to live :-)

> @@ -879,7 +879,8 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
> {
> struct kvm_mmu_page *root;
>
> - for_each_tdp_mmu_root_yield_safe(kvm, root, as_id, false)
> + lockdep_assert_held_write(&kvm->mmu_lock);
> + for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
> flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);
>
> return flush;
> @@ -895,8 +896,9 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
> * is being destroyed or the userspace VMM has exited. In both cases,
> * KVM_RUN is unreachable, i.e. no vCPUs will ever service the request.
> */
> + lockdep_assert_held_write(&kvm->mmu_lock);
> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> - for_each_tdp_mmu_root_yield_safe(kvm, root, i, false)
> + for_each_tdp_mmu_root_yield_safe(kvm, root, i)
> tdp_mmu_zap_root(kvm, root, false);
> }
> }
> --

If you hoist the patch "KVM: x86/mmu: Require mmu_lock be held for write in unyielding
root iter" to be the first in the series, then you can instead add the lockdep
assertion to the iterator itself. Slotted in earlier in the series, it becomes...

From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Date: Wed, 2 Mar 2022 09:25:30 -0800
Subject: [PATCH] KVM: x86/mmu: do not allow readers to acquire references to
invalid roots

Remove the "shared" argument of for_each_tdp_mmu_root_yield_safe, thus
ensuring that readers do not ever acquire a reference to an invalid root.
After this patch, all readers except kvm_tdp_mmu_zap_invalidated_roots()
treat refcount=0/valid, refcount=0/invalid and refcount=1/invalid in
exactly the same way. kvm_tdp_mmu_zap_invalidated_roots() is different
but it also does not acquire a reference to the invalid root, and it
cannot see refcount=0/invalid because it is guaranteed to run after
kvm_tdp_mmu_invalidate_all_roots().

Opportunistically add a lockdep assertion to the yield-safe iterator.

Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
arch/x86/kvm/mmu/tdp_mmu.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0cb834aa5406..46752093b79c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -158,14 +158,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
for (_root = tdp_mmu_next_root(_kvm, NULL, _shared, _only_valid); \
_root; \
_root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid)) \
- if (kvm_mmu_page_as_id(_root) != _as_id) { \
+ if (kvm_lockdep_assert_mmu_lock_held(_kvm, _shared) && \
+ kvm_mmu_page_as_id(_root) != _as_id) { \
} else

#define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared) \
__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, true)

-#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared) \
- __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared, false)
+#define for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id) \
+ __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, false, false)

/*
* Iterate over all TDP MMU roots. Requires that mmu_lock be held for write,
@@ -800,7 +801,7 @@ bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
{
struct kvm_mmu_page *root;

- for_each_tdp_mmu_root_yield_safe(kvm, root, as_id, false)
+ for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
flush = zap_gfn_range(kvm, root, start, end, can_yield, flush,
false);


base-commit: 57352b7efee6b38eb9eb899b8f013652afe4e110
--