Re: [RFC PATCH v2 00/17] Core scheduling v2

From: Julien Desfossez
Date: Thu Apr 25 2019 - 10:36:55 EST


On 23-Apr-2019 04:18:05 PM, Vineeth Remanan Pillai wrote:
> Second iteration of the core-scheduling feature.
>
> This version fixes apparent bugs and performance issues in v1. This
> doesn't fully address the issue of core sharing between processes
> with different tags. Core sharing still happens 1% to 5% of the time
> based on the nature of workload and timing of the runnable processes.
>
> Changes in v2
> -------------
> - rebased on mainline commit: 6d906f99817951e2257d577656899da02bb33105

Here are our benchmark results.

Environment setup:
------------------
Skylake server, 2 numa nodes, total 72 CPUs with HT on
Workload in KVM virtual machines, one cpu cgroup per VM (including qemu
and vhost threads)


Case 1: MySQL TPC-C
-------------------
1 12-vcpus-32gb MySQL server per numa node (clients on another physical
machine)
96 semi-idle 1-vcpu-512mb VM per numa node (sending metrics over a VPN
every 15 seconds)
--> 3 vcpus per physical CPU
Average of 10 5-minutes runs.

- baseline:
- avg tps: 1878
- stdev tps: 47
- nosmt:
- avg tps: 959 (-49% from baseline)
- stdev tps: 35
- core scheduling:
- avg tps: 1406 (-25% from baseline)
- stdev tps: 48
- Co-scheduling stats (5 minutes sample):
- 48.9% VM threads
- 49.6% idle
- 1.3% foreign threads

So in the v2, the case with a very noisy test, benefits from core
scheduling (the baseline is also better compared to v1 so we probably
benefit from other changes in the kernel).


Case 2: linpack with enough room
--------------------------------
2 12-vcpus-32gb linpack VMs both pinned on the same NUMA node (36
hardware threads with SMT on).
100k context switches/sec.
Average of 5 15-minutes runs.

- baseline:
- avg gflops: 403
- stdev: 20
- nosmt:
- avg gflops: 355 (-12% from baseline)
- stdev: 28
- core scheduling:
- avg gflops: 364 (-9% from baseline)
- stdev: 59
- Co-scheduling stats (5 minutes sample):
- 39.3% VM threads
- 59.3% idle
- 0.07% foreign threads

No real difference between nosmt and core scheduling when there is
enough room to run a cpu-intensive workload even with smt off.


Case 3: full node linpack
-------------------------
3 12-vcpus-32gb linpack VMs all pinned on the same NUMA node (36
hardware threads with SMT on).
155k context switches/sec
Average of 5 15-minutes runs.

- baseline:
- avg gflops: 270
- stdev: 5
- nosmt (switching to 2:1 ratio of vcpu to hardware threads):
- avg gflops: 209 (-22.46% from baseline)
- stdev: 6.2
- core scheduling
- avg gflops: 269 (-0.11% from baseline)
- stdev: 5.7
- Co-scheduling stats (5 minutes sample):
- 93.7% VM threads
- 6.3% idle
- 0.04% foreign threads

Here the core scheduling is a major improvement in terms of performance
compared to nosmt.

Julien