Re: [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root

From: Paolo Bonzini
Date: Tue Mar 01 2022 - 15:12:55 EST


On 3/1/22 20:43, Sean Christopherson wrote:
and after spending quite some time I wonder if all this should just be

if (refcount_dec_not_one(&root->tdp_mmu_root_count))
return;

if (!xchg(&root->role.invalid, true) {

The refcount being '1' means there's another task currently using root, marking
the root invalid will mean checks on the root's validity are non-deterministic
for the other task.

Do you mean it's not possible to use refcount_dec_not_one, otherwise kvm_tdp_mmu_get_root is not guaranteed to reject the root?

tdp_mmu_zap_root(kvm, root, shared);

/*
* Do not assume the refcount is still 1: because
* tdp_mmu_zap_root can yield, a different task
* might have grabbed a reference to this root.
*
if (refcount_dec_not_one(&root->tdp_mmu_root_count))

This is wrong, _this_ task can't drop a reference taken by the other task.

This is essentially the "kvm_tdp_mmu_put_root(kvm, root, shared);" (or "goto beginning_of_function;") part of your patch.

return;
}

/*
* The root is invalid, and its reference count has reached
* zero. It must have been zapped either in the "if" above or
* by someone else, and we're definitely the last thread to see
* it apart from RCU-protected page table walks.
*/
refcount_set(&root->tdp_mmu_root_count, 0);

Not sure what you intended here, KVM should never force a refcount to '0'.

It's turning a refcount_dec_not_one into a refcount_dec_and_test. It seems legit to me, because the only refcount_inc_not_zero is in a write-side critical section. If the refcount goes to zero on the read-side, the root is gone for good.

xchg() is a very good idea. The smp_mb_*() stuff was carried over from the previous
version where this sequence set another flag in addition to role.invalid.

Is this less funky (untested)?

/*
* Invalidate the root to prevent it from being reused by a vCPU while
* the root is being zapped, i.e. to allow yielding while zapping the
* root (see below).
*
* Free the root if it's already invalid. Invalid roots must be zapped
* before their last reference is put, i.e. there's no work to be done,
* and all roots must be invalidated before they're freed (this code).
* Re-zapping invalid roots would put KVM into an infinite loop.
*
* Note, xchg() provides an implicit barrier to ensure role.invalid is
* visible if a concurrent reader acquires a reference after the root's
* refcount is reset.
*/
if (xchg(root->role.invalid, true))
spin_lock(&kvm->arch.tdp_mmu_pages_lock);
list_del_rcu(&root->link);
spin_unlock(&kvm->arch.tdp_mmu_pages_lock);

call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
return;
}

Based on my own version, I guess you mean (without comments due to family NMI):

if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
return;

if (!xchg(&root->role.invalid, true) {
refcount_set(&root->tdp_mmu_count, 1);
tdp_mmu_zap_root(kvm, root, shared);
if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
return;
}

spin_lock(&kvm->arch.tdp_mmu_pages_lock);
list_del_rcu(&root->link);
spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);

Paolo