Re: [PATCH v3 1/2] KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath

From: Wanpeng Li
Date: Wed Nov 20 2019 - 22:16:47 EST


On Thu, 21 Nov 2019 at 07:37, Liran Alon <liran.alon@xxxxxxxxxx> wrote:
>
>
>
> > On 20 Nov 2019, at 5:42, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
> >
> > From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> >
> > ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in
> > our product observation, multicast IPIs are not as common as unicast
> > IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
>
> Have you also had the chance to measure non-Linux workloads. Such as Windows?

I ask around, guys not pay attention to IPIs under windows guests before.

>
> >
> > This patch tries to optimize x2apic physical destination mode, fixed
> > delivery mode single target IPI. The fast path is invoked at
> > ->handle_exit_irqoff() to emulate only the effect of the ICR write
> > itself, i.e. the sending of IPIs. Sending IPIs early in the VM-Exit
> > flow reduces the latency of virtual IPIs by avoiding the expensive bits
> > of transitioning from guest to host, e.g. reacquiring KVM's SRCU lock.
> > Especially when running guest w/ KVM_CAP_X86_DISABLE_EXITS capability
> > enabled or guest can keep running, IPI can be injected to target vCPU
> > by posted-interrupt immediately.
>
> May I suggest an alternative phrasing? Something such as:

Great, thanks for better English.

>
> âââ
> This patch introduce a mechanism to handle certain performance-critical WRMSRs
> in a very early stage of KVM VMExit handler.
>
> This mechanism is specifically used for accelerating writes to x2APIC ICR that
> attempt to send a virtual IPI with physical destination-mode, fixed delivery-mode
> and single target. Which was found as one of the main causes of VMExits for
> Linux workloads.
>
> The reason this mechanism significantly reduce the latency of such virtual IPIs
> is by sending the physical IPI to the target vCPU in a very early stage of KVM
> VMExit handler, before host interrupts are enabled and before expensive
> operations such as reacquiring KVMâs SRCU lock.
> Latency is reduced even more when KVM is able to use APICv posted-interrupt
> mechanism (which allows to deliver the virtual IPI directly to target vCPU without
> the need to kick it to host).
> âââ
>
> >
> > Testing on Xeon Skylake server:
> >
> > The virtual IPI latency from sender send to receiver receive reduces
> > more than 200+ cpu cycles.
> >
> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> > Cc: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> > Cc: Liran Alon <liran.alon@xxxxxxxxxx>
> > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>
> I see you used the code I provided my reply to v2. :)
> I had only some very minor comments inline below. Therefore:
> Reviewed-by: Liran Alon <liran.alon@xxxxxxxxxx>

Thanks, handle them in v4.

>
> Thanks for doing this optimisation.

Thanks everybody who help make this work nice. :)

Wanpeng