Re: [PATCH 0/4] x86/hyper-v: optimize PV IPIs

From: Vitaly Kuznetsov
Date: Thu Jun 28 2018 - 12:27:54 EST


Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> writes:

> Wanpeng Li <kernellwp@xxxxxxxxx> writes:
>
>> Hi Vitaly, (fix my reply mess this time)
>> On Sat, 23 Jun 2018 at 01:09, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>>>
>>> When reviewing my "x86/hyper-v: use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_
>>> {LIST,SPACE} hypercalls when possible" patch Michael suggested to apply the
>>> same idea to PV IPIs. Here we go!
>>>
>>> Despite what Hyper-V TLFS says about HVCALL_SEND_IPI hypercall, it can
>>> actually be 'fast' (passing parameters through registers). Use that too.
>>>
>>> This series can collide with my "KVM: x86: hyperv: PV IPI support for
>>> Windows guests" series as I rename ipi_arg_non_ex/ipi_arg_ex structures
>>> there. Depending on which one gets in first we may need to do tiny
>>> adjustments.
>>
>> As hyperv PV TLB flush has already been merged, is there any other
>> obvious multicast IPIs scenarios? qemu supports interrupt remapping
>> since two years ago, I think windows guest can switch to cluster mode
>> after entering x2APIC, so sending IPI per cluster. In addition, you
>> can also post the benchmark result for this PV IPI optimization,
>> although it also fixes the bug which you mentioned above.
>
> I got confused, which of my patch series are you actually looking at?
> :-)
>
> This particular one ("x86/hyper-v: optimize PV IPIs") is not about
> KVM/qemu, it is for Linux running on top on real Hyper-V server. We
> already support PV IPIs and here I'm just trying to optimize the way how
> we send them by switching to a cheaper hypercall (and using 'fast'
> version of it) when possible. I don't actually have a good benchmark
> (and I don't remember seeing one when K.Y. posted PV IPI support) but
> this can be arranged I guess: I can write a dump 'IPI sender' in kernel
> and send e.g. 1000 IPIs.

So I used the IPI benchmark (https://lkml.org/lkml/2017/12/19/141,
thanks for the tip!) on this series. On a 16 vCPU guest (WS2016) I'm
getting the following:

Before:
Dry-run: 0 203110
Self-IPI: 6167430 11645550
Normal IPI: 380479300 475881820
Broadcast IPI: 0 2557371420

After:
Dry-run: 0 214280 (not interesting)
Self-IPI: 5706210 10697640 (- 8%)
Normal IPI: 379330010 450158830 (- 5%)
Broadcast IPI: 0 2340427160 (- 8%)

--
Vitaly