Re: [PATCH 19/24] kvm: x86/mmu: Protect tdp_mmu_pages with a lock

From: Paolo Bonzini
Date: Tue Jan 26 2021 - 09:30:54 EST


On 21/01/21 22:32, Sean Christopherson wrote:
Coming back to this series, I wonder if the RCU approach is truly necessary to
get the desired scalability. If both zap_collapsible_sptes() and NX huge page
recovery zap_only_ leaf SPTEs, then the only path that can actually unlink a
shadow page while holding the lock for read is the page fault path that installs
a huge page over an existing shadow page.

Assuming the above analysis is correct, I think it's worth exploring alternatives
to using RCU to defer freeing the SP memory, e.g. promoting to a write lock in
the specific case of overwriting a SP (though that may not exist for rwlocks),
or maybe something entirely different?

You can do the deferred freeing with a short write-side critical section to ensure all readers have terminated.

If the bool argument to handle_disconnected_tdp_mmu_page is true(*), the pages would be added to an llist, instead of being freed immediately. At the end of a shared critical section you would do

if (!llist_empty(&kvm->arch.tdp_mmu_disconnected_pages)) {
struct llist_node *first;
kvm_mmu_lock(kvm);
first = __list_del_all(&kvm->arch.tdp_mmu_disconnected_pages);
kvm_mmu_unlock(kvm);

/*
* All vCPUs have already stopped using the pages when
* their TLBs were flushed. The exclusive critical
* section above means that there can be no readers
* either.
*/
tdp_mmu_free_disconnected_pages(first);
}

So this is still deferred reclamation, but it's done by one of the vCPUs rather than a worker RCU thread. This would replace patches 11/12/13 and probably would be implemented after patch 18.

Paolo

(*) this idea is what prompted the comment about s/atomic/shared/