On Sun, 2013-06-02 at 00:51 +0530, Raghavendra K T wrote:This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.
Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
causing undercommit degradation (after PLE handler improvement).
- Added kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler
V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.
Sorry for not posting this sooner. I have tested the v9 pv-ticketlock
patches in 1x and 2x over-commit with 10-vcpu and 20-vcpu VMs. I have
tested these patches with and without PLE, as PLE is still not scalable
with large VMs.
System: x3850X5, 40 cores, 80 threads
1x over-commit with 10-vCPU VMs (8 VMs) all running dbench:
----------------------------------------------------------
Total
Configuration Throughput(MB/s) Notes
3.10-default-ple_on 22945 5% CPU in host kernel, 2% spin_lock in guests
3.10-default-ple_off 23184 5% CPU in host kernel, 2% spin_lock in guests
3.10-pvticket-ple_on 22895 5% CPU in host kernel, 2% spin_lock in guests
3.10-pvticket-ple_off 23051 5% CPU in host kernel, 2% spin_lock in guests
[all 1x results look good here]
2x over-commit with 10-vCPU VMs (16 VMs) all running dbench:
-----------------------------------------------------------
Total
Configuration Throughput Notes
3.10-default-ple_on 6287 55% CPU host kernel, 17% spin_lock in guests
3.10-default-ple_off 1849 2% CPU in host kernel, 95% spin_lock in guests
3.10-pvticket-ple_on 6691 50% CPU in host kernel, 15% spin_lock in guests
3.10-pvticket-ple_off 16464 8% CPU in host kernel, 33% spin_lock in guests
[PLE hinders pv-ticket improvements, but even with PLE off,
we still off from ideal throughput (somewhere >20000)]
1x over-commit with 20-vCPU VMs (4 VMs) all running dbench:
----------------------------------------------------------
Total
Configuration Throughput Notes
3.10-default-ple_on 22736 6% CPU in host kernel, 3% spin_lock in guests
3.10-default-ple_off 23377 5% CPU in host kernel, 3% spin_lock in guests
3.10-pvticket-ple_on 22471 6% CPU in host kernel, 3% spin_lock in guests
3.10-pvticket-ple_off 23445 5% CPU in host kernel, 3% spin_lock in guests
[1x looking fine here]
2x over-commit with 20-vCPU VMs (8 VMs) all running dbench:
----------------------------------------------------------
Total
Configuration Throughput Notes
3.10-default-ple_on 1965 70% CPU in host kernel, 34% spin_lock in guests
3.10-default-ple_off 226 2% CPU in host kernel, 94% spin_lock in guests
3.10-pvticket-ple_on 1942 70% CPU in host kernel, 35% spin_lock in guests
3.10-pvticket-ple_off 8003 11% CPU in host kernel, 70% spin_lock in guests
[quite bad all around, but pv-tickets with PLE off the best so far.
Still quite a bit off from ideal throughput]
In summary, I would state that the pv-ticket is an overall win, but the
current PLE handler tends to "get in the way" on these larger guests.
-Andrew