Re: [PATCH 0/2] KVM: enable halt poll shrink parameter

From: Sean Christopherson
Date: Fri May 03 2024 - 17:48:58 EST


On Thu, Nov 02, 2023, Parshuram Sangle wrote:
> KVM halt polling interval growth and shrink behavior has evolved since its
> inception. The current mechanism adjusts the polling interval based on whether
> vcpu wakeup was received or not during polling interval using grow and shrink
> parameter values. Though grow parameter is logically set to 2 by default,
> shrink parameter is kept disabled (set to 0).
>
> Disabled shrink has two issues:
> 1) Resets polling interval to 0 on every un-successful poll assuming it is
> less likely to receive a vcpu wakeup in further shrunk intervals.
> 2) Even on successful poll, if total block time is greater or equal to current
> poll_ns value, polling interval is reset to 0 instead shrinking gradually.
>
> These aspects reduce the chances receiving valid wakeup during polling and
> lose potential performance benefits for VM workloads.
>
> Below is the summary of experiments conducted to assess performance and power
> impact by enabling the halt_poll_ns_shrink parameter(value set to 2).
>
> Performance Test Summary: (Higher is better)
> --------------------------------------------
> Platform Details: Chrome Brya platform
> CPU - Alder Lake (12th Gen Intel CPU i7-1255U)
> Host kernel version - 5.15.127-20371-g710a1611ad33
>
> Android VM workload (Score) Base Shrink Enabled (value 2) Delta
> ---------------------------------------------------------------------------
> GeekBench Multi-core(CPU) 5754 5856 2%
> 3D Mark Slingshot(CPU+GPU) 15486 15885 3%
> Stream (handopt)(Memory) 20566 21594 5%
> fio seq-read (Storage) 727 747 3%
> fio seq-write (Storage) 331 343 3%
> fio rand-read (Storage) 690 732 6%
> fio rand-write (Storage) 299 300 1%
>
> Steam Gaming VM (Avg FPS) Base Shrink Enabled (value 2) Delta
> ---------------------------------------------------------------------------
> Metro Redux (OpenGL) 54.80 59.60 9%
> Dota 2 (Open GL) 48.74 51.40 5%
> Dota 2 (Vulkan) 20.80 21.10 1%
> SpaceShip (Vulkan) 20.40 21.52 6%
>
> With Shrink enabled, majority of workloads show higher % of successful polling.
> Reduced latency of returning control back to VM and avoided overhead of vm_exit
> contribute to these performance gains.
>
> Power Impact Assessment Summary: (Lower is better)
> --------------------------------------------------
> Method : DAQ measurements of CPU and Memory rails
>
> CPU+Memory (Watt) Base Shrink Enabled (value 2) Delta
> ---------------------------------------------------------------------------
> Idle* (Host) 0.636 0.631 -0.8%
> Video Playback (Host) 2.225 2.210 -0.7%
> Tomb Raider (VM) 17.261 17.175 -0.5%
> SpaceShip Benchmark(VM) 17.079 17.123 0.3%
>
> *Idle power - Idle system with no application running, Android and Borealis
> VMs enabled running no workload. Duration 180 sec.
>
> Power measurements done for Chrome idle scenario and active Gaming VM
> workload show negligible power overhead since additional polling creates
> very short duration bursts which are less likely to have gone to a
> complete idle CPU state.
>
> NOTE: No tests are conducted on non-x86 platform with this changed config
>
> The default values of grow and shrink parameters get commonly used by
> various VM deployments unless specifically tuned for performance. Hence
> referring to performance and power measurements results shown above, it is
> recommended to have shrink enabled (with value 2) by default so that there
> is no need to explicitly set this parameter through kernel cmdline or by
> other means.

I am by no means an expert on halt polling or power management, but all of this
seems like a reasonable tradeoff. And even without the numbers you provided,
starting from scratch after a single failure is rather odd.

So unless someone objects, I'll plan on applying this for 6.11 in a few weeks
(after the 6.10 merge window closes).