Re: [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker

From: Sean Christopherson
Date: Wed Mar 02 2022 - 17:26:01 EST


On Wed, Mar 02, 2022, Paolo Bonzini wrote:
> On 3/2/22 21:47, Sean Christopherson wrote:
> > On Wed, Mar 02, 2022, Paolo Bonzini wrote:
> > > For now let's do it the simple but ugly way. Keeping
> > > next_invalidated_root() does not make things worse than the status quo, and
> > > further work will be easier to review if it's kept separate from this
> > > already-complex work.
> >
> > Oof, that's not gonna work. My approach here in v3 doesn't work either. I finally
> > remembered why I had the dedicated tdp_mmu_defunct_root flag and thus the smp_mb_*()
> > dance.
> >
> > kvm_tdp_mmu_zap_invalidated_roots() assumes that it was gifted a reference to
> > _all_ invalid roots by kvm_tdp_mmu_invalidate_all_roots(). This works in the
> > current code base only because kvm->slots_lock is held for the entire duration,
> > i.e. roots can't become invalid between the end of kvm_tdp_mmu_invalidate_all_roots()
> > and the end of kvm_tdp_mmu_zap_invalidated_roots().
>
> Yeah, of course that doesn't work if kvm_tdp_mmu_zap_invalidated_roots()
> calls kvm_tdp_mmu_put_root() and the worker also does the same
> kvm_tdp_mmu_put_root().
>
> But, it seems so me that we were so close to something that works and is
> elegant with the worker idea. It does avoid the possibility of two "puts",
> because the work item is created on the valid->invalid transition. What do
> you think of having a separate workqueue for each struct kvm, so that
> kvm_tdp_mmu_zap_invalidated_roots() can be replaced with a flush?

I definitely like the idea, but I'm getting another feeling of deja vu. Ah, I
think the mess I created was zapping via async worker without a dedicated workqueue,
and so the flush became very annoying/painful.

I have the "dedicated list" idea coded up. If testing looks good, I'll post it as
a v3.5 (without your xchg() magic or other kvm_tdp_mmu_put_root() changes). That
way we have a less-awful backup (and/or an intermediate step) if the workqueue
idea is delayed or doesn't work. Assuming it works, it's much prettier than having
a defunct flag.

> I can probably do it next Friday.

Early-ish warning, I'll be offline March 11th - March 23rd inclusive.

FWIW, other than saving me from another painful rebase, there's no urgent need to
get this series into 5.18.