Re: [RFC PATCH v2 00/17] Core scheduling v2

From: Aubrey Li
Date: Wed Apr 24 2019 - 23:15:24 EST


On Wed, Apr 24, 2019 at 10:00 PM Julien Desfossez
<jdesfossez@xxxxxxxxxxxxxxxx> wrote:
>
> On 24-Apr-2019 09:13:10 PM, Aubrey Li wrote:
> > On Wed, Apr 24, 2019 at 12:18 AM Vineeth Remanan Pillai
> > <vpillai@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > Second iteration of the core-scheduling feature.
> > >
> > > This version fixes apparent bugs and performance issues in v1. This
> > > doesn't fully address the issue of core sharing between processes
> > > with different tags. Core sharing still happens 1% to 5% of the time
> > > based on the nature of workload and timing of the runnable processes.
> > >
> > > Changes in v2
> > > -------------
> > > - rebased on mainline commit: 6d906f99817951e2257d577656899da02bb33105
> >
> > Thanks to post v2, based on this version, here is my benchmarks result.
> >
> > Environment setup
> > --------------------------
> > Skylake server, 2 numa nodes, 104 CPUs (HT on)
> > cgroup1 workload, sysbench (CPU intensive non AVX workload)
> > cgroup2 workload, gemmbench (AVX512 workload)
> >
> > Case 1: task number < CPU num
> > --------------------------------------------
> > 36 sysbench threads in cgroup1
> > 36 gemmbench threads in cgroup2
> >
> > core sched off:
> > - sysbench 95th percentile latency(ms): avg = 4.952, stddev = 0.55342
> > core sched on:
> > - sysbench 95th percentile latency(ms): avg = 3.549, stddev = 0.04449
> >
> > Due to core cookie matching, sysbench tasks won't be affect by AVX512
> > tasks, latency has ~28% improvement!!!
> >
> > Case 2: task number > CPU number
> > -------------------------------------------------
> > 72 sysbench threads in cgroup1
> > 72 gemmbench threads in cgroup2
> >
> > core sched off:
> > - sysbench 95th percentile latency(ms): avg = 11.914, stddev = 3.259
> > core sched on:
> > - sysbench 95th percentile latency(ms): avg = 13.289, stddev = 4.863
> >
> > So not only power, now security and performance is a pair of contradictions.
> > Due to core cookie not matching and forced idle introduced, latency has ~12%
> > regression.
> >
> > Any comments?
>
> Would it be possible to post the results with HT off as well ?

What's the point here to turn HT off? The latency is sensitive to the
relationship
between the task number and CPU number. Usually less CPU number, more run
queue wait time, and worse result.

Thanks,
-Aubrey