Re: [PATCH v4 21/30] KVM: x86/mmu: Zap invalidated roots via asynchronous worker

From: Sean Christopherson
Date: Thu Mar 03 2022 - 16:06:24 EST


On Thu, Mar 03, 2022, Sean Christopherson wrote:
> On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> > + root->tdp_mmu_async_data = kvm;
> > + INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work);
> > + queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
> > +}
> > +
> > +static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
> > +{
> > + union kvm_mmu_page_role role = page->role;
> > + role.invalid = true;
> > +
> > + /* No need to use cmpxchg, only the invalid bit can change. */
> > + role.word = xchg(&page->role.word, role.word);
> > + return role.invalid;
>
> This helper is unused. It _could_ be used here, but I think it belongs in the
> next patch. Critically, until zapping defunct roots creates the invariant that
> invalid roots are _always_ zapped via worker, kvm_tdp_mmu_invalidate_all_roots()
> must not assume that an invalid root is queued for zapping. I.e. doing this
> before the "Zap defunct roots" would be wrong:
>
> list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
> if (kvm_tdp_root_mark_invalid(root))
> continue;
>
> if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)));
> continue;
>
> tdp_mmu_schedule_zap_root(kvm, root);
> }

Gah, lost my train of thought and forgot that this _can_ re-queue a root even in
this patch, it just can't it just can't re-queue a root that is _currently_ queued.

The re-queue scenario happens if a root is queued and zapped, but is kept alive
by a vCPU that hasn't yet put its reference. If another memslot comes along before
the (sleeping) vCPU drops its reference, this will re-queue the root.

It's not a major problem in this patch as it's a small amount of wasted effort,
but it will be an issue when the "put" path starts using the queue, as that will
create a scenario where a memslot update (or NX toggle) can come along while a
defunct root is in the zap queue.

Checking for role.invalid is wrong (as above), so for this patch I think the
easiest thing is to use tdp_mmu_async_data as a sentinel that the root was zapped
in the past and doesn't need to be re-zapped.

/*
* Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that
* is about to be zapped, e.g. in response to a memslots update. The actual
* zapping is performed asynchronously, so a reference is taken on all roots.
* Using a separate workqueue makes it easy to ensure that the destruction is
* performed before the "fast zap" completes, without keeping a separate list
* of invalidated roots; the list is effectively the list of work items in
* the workqueue.
*
* Skip roots that were already queued for zapping, the "fast zap" path is the
* only user of the zap queue and always flushes the queue under slots_lock,
* i.e. the queued zap is guaranteed to have completed already.
*
* Because mmu_lock is held for write, it should be impossible to observe a
* root with zero refcount,* i.e. the list of roots cannot be stale.
*
* This has essentially the same effect for the TDP MMU
* as updating mmu_valid_gen does for the shadow MMU.
*/
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
{
struct kvm_mmu_page *root;

lockdep_assert_held_write(&kvm->mmu_lock);
list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
if (root->tdp_mmu_async_data)
continue;

if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)))
continue;

root->role.invalid = true;
tdp_mmu_schedule_zap_root(kvm, root);
}
}