Re: [Patch] KVM: x86/mmu: Make optimized __handle_changed_spte() for clear dirty log

From: Ben Gardon
Date: Wed Jan 25 2023 - 17:25:23 EST


On Wed, Jan 25, 2023 at 2:00 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
>
> On Wed, Jan 25, 2023 at 01:38:57PM -0800, Vipin Sharma wrote:
> > Use a tone down version of __handle_changed_spte() when clearing dirty
> > log. Remove checks which will not be needed when dirty logs are cleared.
> >
> > This change shows ~13% improvement in clear dirty log calls in
> > dirty_log_perf_test
> >
> > Before tone down version:
> > Clear dirty log over 3 iterations took 10.006764203s. (Avg 3.335588067s/iteration)
> >
> > After tone down version:
> > Clear dirty log over 3 iterations took 8.686433554s. (Avg 2.895477851s/iteration)
> >
> > Signed-off-by: Vipin Sharma <vipinsh@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++++++++++++-
> > 1 file changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index bba33aea0fb0..ca21b33c4386 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -504,6 +504,19 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> > call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > }
> >
> > +static void handle_changed_spte_clear_dirty_log(int as_id, gfn_t gfn,
> > + u64 old_spte, u64 new_spte,
> > + int level)
> > +{
> > + if (old_spte == new_spte)
> > + return;
> > +
> > + trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
> > +
> > + if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
> > + kvm_set_pfn_dirty(spte_to_pfn(old_spte));
> > +}
> > +
> > /**
> > * __handle_changed_spte - handle bookkeeping associated with an SPTE change
> > * @kvm: kvm instance
> > @@ -736,7 +749,12 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> >
> > old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
> >
> > - __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> > + if (record_dirty_log)
> > + __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte,
> > + level, false);
> > + else
> > + handle_changed_spte_clear_dirty_log(as_id, gfn, old_spte,
> > + new_spte, level);
>
> I find it very non-intuitive to tie this behavior to !record_dirty_log,
> even though it happens to work. It's also risky if any future calls are
> added that pass record_dirty_log=false but change other bits in the
> SPTE.
>
> I wonder if we could get the same effect by optimizing
> __handle_changed_spte() to check for a cleared D-bit *first* and if
> that's the only diff between the old and new SPTE, bail immediately
> rather than doing all the other checks.

I would also prefer that course. One big lesson I took when building
the TDP MMU was the value of having all the changed PTE handling go
through one function. That's not always the right choice but the
Shadow MMU has a crazy spaghetti code of different functions which
handle different parts of responding to a PTE change and it makes the
code very hard to read. I agree this path is worth optimizing, but the
more we can keep the code together, the better.