Re: [PATCH v2 00/10] KVM: Consolidate and optimize MMU notifiers

From: Marc Zyngier
Date: Mon Apr 12 2021 - 06:27:35 EST


On Fri, 02 Apr 2021 13:17:45 +0100,
Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 02/04/21 02:56, Sean Christopherson wrote:
> > The end goal of this series is to optimize the MMU notifiers to take
> > mmu_lock if and only if the notification is relevant to KVM, i.e. the hva
> > range overlaps a memslot. Large VMs (hundreds of vCPUs) are very
> > sensitive to mmu_lock being taken for write at inopportune times, and
> > such VMs also tend to be "static", e.g. backed by HugeTLB with minimal
> > page shenanigans. The vast majority of notifications for these VMs will
> > be spurious (for KVM), and eliding mmu_lock for spurious notifications
> > avoids an otherwise unacceptable disruption to the guest.
> >
> > To get there without potentially degrading performance, e.g. due to
> > multiple memslot lookups, especially on non-x86 where the use cases are
> > largely unknown (from my perspective), first consolidate the MMU notifier
> > logic by moving the hva->gfn lookups into common KVM.
> >
> > Based on kvm/queue, commit 5f986f748438 ("KVM: x86: dump_vmcs should
> > include the autoload/autostore MSR lists").
> >
> > Well tested on Intel and AMD. Compile tested for arm64, MIPS, PPC,
> > PPC e500, and s390. Absolutely needs to be tested for real on non-x86,
> > I give it even odds that I introduced an off-by-one bug somewhere.
> >
> > v2:
> > - Drop the patches that have already been pushed to kvm/queue.
> > - Drop two selftest changes that had snuck in via "git commit -a".
> > - Add a patch to assert that mmu_notifier_count is elevated when
> > .change_pte() runs. [Paolo]
> > - Split out moving KVM_MMU_(UN)LOCK() to __kvm_handle_hva_range() to a
> > separate patch. Opted not to squash it with the introduction of the
> > common hva walkers (patch 02), as that prevented sharing code between
> > the old and new APIs. [Paolo]
> > - Tweak the comment in kvm_vm_destroy() above the smashing of the new
> > slots lock. [Paolo]
> > - Make mmu_notifier_slots_lock unconditional to avoid #ifdefs. [Paolo]
> >
> > v1:
> > - https://lkml.kernel.org/r/20210326021957.1424875-1-seanjc@xxxxxxxxxx
> >
> > Sean Christopherson (10):
> > KVM: Assert that notifier count is elevated in .change_pte()
> > KVM: Move x86's MMU notifier memslot walkers to generic code
> > KVM: arm64: Convert to the gfn-based MMU notifier callbacks
> > KVM: MIPS/MMU: Convert to the gfn-based MMU notifier callbacks
> > KVM: PPC: Convert to the gfn-based MMU notifier callbacks
> > KVM: Kill off the old hva-based MMU notifier callbacks
> > KVM: Move MMU notifier's mmu_lock acquisition into common helper
> > KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
> > memslot
> > KVM: Don't take mmu_lock for range invalidation unless necessary
> > KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if
> > possible
> >
> > arch/arm64/kvm/mmu.c | 117 +++------
> > arch/mips/kvm/mmu.c | 97 ++------
> > arch/powerpc/include/asm/kvm_book3s.h | 12 +-
> > arch/powerpc/include/asm/kvm_ppc.h | 9 +-
> > arch/powerpc/kvm/book3s.c | 18 +-
> > arch/powerpc/kvm/book3s.h | 10 +-
> > arch/powerpc/kvm/book3s_64_mmu_hv.c | 98 ++------
> > arch/powerpc/kvm/book3s_64_mmu_radix.c | 25 +-
> > arch/powerpc/kvm/book3s_hv.c | 12 +-
> > arch/powerpc/kvm/book3s_pr.c | 56 ++---
> > arch/powerpc/kvm/e500_mmu_host.c | 27 +-
> > arch/x86/kvm/mmu/mmu.c | 127 ++++------
> > arch/x86/kvm/mmu/tdp_mmu.c | 245 +++++++------------
> > arch/x86/kvm/mmu/tdp_mmu.h | 14 +-
> > include/linux/kvm_host.h | 22 +-
> > virt/kvm/kvm_main.c | 325 +++++++++++++++++++------
> > 16 files changed, 552 insertions(+), 662 deletions(-)
> >
>
> For MIPS, I am going to post a series that simplifies TLB flushing
> further. I applied it, and rebased this one on top, to
> kvm/mmu-notifier-queue.
>
> Architecture maintainers, please look at the branch and
> review/test/ack your parts.

I've given this a reasonably good beating on arm64 for both VHE and
nVHE HW, and nothing caught fire, although I was left with a conflict
in the x86 code after merging with linux/master.

Feel free to add a

Tested-by: Marc Zyngier <maz@xxxxxxxxxx>

for the arm64 side.

M.

--
Without deviation from the norm, progress is not possible.