Re: [PATCH v4 21/30] KVM: x86/mmu: Zap invalidated roots via asynchronous worker

From: Paolo Bonzini
Date: Fri Mar 04 2022 - 01:48:35 EST


On 3/3/22 22:32, Sean Christopherson wrote:

The re-queue scenario happens if a root is queued and zapped, but is kept alive
by a vCPU that hasn't yet put its reference. If another memslot comes along before
the (sleeping) vCPU drops its reference, this will re-queue the root.

It's not a major problem in this patch as it's a small amount of wasted effort,
but it will be an issue when the "put" path starts using the queue, as that will
create a scenario where a memslot update (or NX toggle) can come along while a
defunct root is in the zap queue.

As of this patch it's not a problem because kvm_tdp_mmu_invalidate_all_roots()'s caller holds kvm->slots_lock, so kvm_tdp_mmu_invalidate_all_roots() is guarantee to queue its work items on an empty workqueue. In effect the workqueue is just a fancy list. But as you point out in the review to patch 24, it becomes a problem when there's no kvm->slots_lock to guarantee that. Then it needs to check that the root isn't already invalid.

The only issue is that kvm_tdp_mmu_invalidate_all_roots() now assumes that
there is at least one reference in kvm->users_count; so if the VM is dying
just go through the slow path, as there is nothing to gain by using the fast
zapping.
This isn't actually implemented.:-)
Oh, and when you implement it (or copy paste), can you also add a lockdep and a
comment about the check being racy, but that the race is benign? It took me a
second to realize why it's safe to use a work queue without holding a reference
to @kvm.

I didn't remove the paragraph from the commit message, but I think it's unnecessary now. The workqueue is flushed in kvm_mmu_zap_all_fast() and kvm_mmu_uninit_tdp_mmu(), unlike the buggy patch, so it doesn't need to take a reference to the VM.

I think I don't even need to check kvm->users_count in the defunct root case, as long as kvm_mmu_uninit_tdp_mmu() flushes and destroys the workqueue before it checks that the lists are empty.

I'll wait to hear from you later today before sending out v5.

Paolo