Re: [PATCH v7 00/12] KVM: Add host swap event notifications for PVguest

From: Marcelo Tosatti
Date: Mon Oct 18 2010 - 12:08:11 EST


On Thu, Oct 14, 2010 at 11:22:44AM +0200, Gleb Natapov wrote:
> KVM virtualizes guest memory by means of shadow pages or HW assistance
> like NPT/EPT. Not all memory used by a guest is mapped into the guest
> address space or even present in a host memory at any given time.
> When vcpu tries to access memory page that is not mapped into the guest
> address space KVM is notified about it. KVM maps the page into the guest
> address space and resumes vcpu execution. If the page is swapped out from
> the host memory vcpu execution is suspended till the page is swapped
> into the memory again. This is inefficient since vcpu can do other work
> (run other task or serve interrupts) while page gets swapped in.
>
> The patch series tries to mitigate this problem by introducing two
> mechanisms. The first one is used with non-PV guest and it works like
> this: when vcpu tries to access swapped out page it is halted and
> requested page is swapped in by another thread. That way vcpu can still
> process interrupts while io is happening in parallel and, with any luck,
> interrupt will cause the guest to schedule another task on the vcpu, so
> it will have work to do instead of waiting for the page to be swapped in.
>
> The second mechanism introduces PV notification about swapped page state to
> a guest (asynchronous page fault). Instead of halting vcpu upon access to
> swapped out page and hoping that some interrupt will cause reschedule we
> immediately inject asynchronous page fault to the vcpu. PV aware guest
> knows that upon receiving such exception it should schedule another task
> to run on the vcpu. Current task is put to sleep until another kind of
> asynchronous page fault is received that notifies the guest that page
> is now in the host memory, so task that waits for it can run again.
>
> To measure performance benefits I use a simple benchmark program (below)
> that starts number of threads. Some of them do work (increment counter),
> others access huge array in random location trying to generate host page
> faults. The size of the array is smaller then guest memory bug bigger
> then host memory so we are guarantied that host will swap out part of
> the array.
>
> I ran the benchmark on three setups: with current kvm.git (master),
> with my patch series + non-pv guest (nonpv) and with my patch series +
> pv guest (pv).
>
> Each guest had 4 cpus and 2G memory and was launched inside 512M memory
> container. The command line was "./bm -f 4 -w 4 -t 60" (run 4 faulting
> threads and 4 working threads for a minute).
>
> Below is the total amount of "work" each guest managed to do
> (average of 10 runs):
> total work std error
> master: 122789420615 (3818565029)
> nonpv: 138455939001 (773774299)
> pv: 234351846135 (10461117116)
>
> Changes:
> v1->v2
> Use MSR instead of hypercall.
> Move most of the code into arch independent place.
> halt inside a guest instead of doing "wait for page" hypercall if
> preemption is disabled.
> v2->v3
> Use MSR from range 0x4b564dxx.
> Add slot version tracking.
> Support migration by restarting all guest processes after migration.
> Drop patch that tract preemptability for non-preemptable kernels
> due to performance concerns. Send async PF to non-preemptable
> guests only when vcpu is executing userspace code.
> v3->v4
> Provide alternative page fault handler in PV guest instead of adding hook to
> standard page fault handler and patch it out on non-PV guests.
> Allow only limited number of outstanding async page fault per vcpu.
> Unify gfn_to_pfn and gfn_to_pfn_async code.
> Cancel outstanding slow work on reset.
> v4->v5
> Move async pv cpu initialization into cpu hotplug notifier.
> Use GFP_NOWAIT instead of GFP_ATOMIC for allocation that shouldn't sleep
> Process KVM_REQ_MMU_SYNC even in page_fault_other_cr3() before changing
> cr3 back
> v5->v6
> To many. Will list only major changes here.
> Replace slow work with work queues.
> Halt vcpu for non-pv guests.
> Handle async PF in nested SVM mode.
> Do not prefault swapped in page for non tdp case.
> v6->v7
> Fix "GUP fail in work thread" problem
> Do prefault only if mmu is in direct map mode
> Use cpu->request to ask for vcpu halt (drop optimization that tried to
> skip non-present apf injection if page is swapped in before next vmentry)
> Keep track of synthetic halt in separate state to prevent it from leaking
> during migration.
> Fix memslot tracking problems.
> More documentation.
> Other small comments are addressed
>
> Gleb Natapov (12):
> Add get_user_pages() variant that fails if major fault is required.
> Halt vcpu if page it tries to access is swapped out.
> Retry fault before vmentry
> Add memory slot versioning and use it to provide fast guest write interface
> Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
> Add PV MSR to enable asynchronous page faults delivery.
> Add async PF initialization to PV guest.
> Handle async PF in a guest.
> Inject asynchronous page fault into a PV guest if page is swapped out.
> Handle async PF in non preemptable context
> Let host know whether the guest can handle async PF in non-userspace context.
> Send async PF when guest is not in userspace too.
>
> Documentation/kernel-parameters.txt | 3 +
> Documentation/kvm/cpuid.txt | 3 +
> Documentation/kvm/msr.txt | 36 ++++-
> arch/x86/include/asm/kvm_host.h | 28 +++-
> arch/x86/include/asm/kvm_para.h | 24 +++
> arch/x86/include/asm/traps.h | 1 +
> arch/x86/kernel/entry_32.S | 10 +
> arch/x86/kernel/entry_64.S | 3 +
> arch/x86/kernel/kvm.c | 315 +++++++++++++++++++++++++++++++++++
> arch/x86/kernel/kvmclock.c | 13 +--
> arch/x86/kvm/Kconfig | 1 +
> arch/x86/kvm/Makefile | 1 +
> arch/x86/kvm/mmu.c | 61 ++++++-
> arch/x86/kvm/paging_tmpl.h | 8 +-
> arch/x86/kvm/svm.c | 45 ++++-
> arch/x86/kvm/x86.c | 192 +++++++++++++++++++++-
> fs/ncpfs/mmap.c | 2 +
> include/linux/kvm.h | 1 +
> include/linux/kvm_host.h | 39 +++++
> include/linux/kvm_types.h | 7 +
> include/linux/mm.h | 5 +
> include/trace/events/kvm.h | 95 +++++++++++
> mm/filemap.c | 3 +
> mm/memory.c | 31 +++-
> mm/shmem.c | 8 +-
> virt/kvm/Kconfig | 3 +
> virt/kvm/async_pf.c | 213 +++++++++++++++++++++++
> virt/kvm/async_pf.h | 36 ++++
> virt/kvm/kvm_main.c | 132 ++++++++++++---
> 29 files changed, 1255 insertions(+), 64 deletions(-)
> create mode 100644 virt/kvm/async_pf.c
> create mode 100644 virt/kvm/async_pf.h

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/