Re: [RFC PATCH v2 00/17] Core scheduling v2

From: Aubrey Li
Date: Sun Apr 28 2019 - 06:30:02 EST


On Sun, Apr 28, 2019 at 5:33 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> So because I'm a big fan of presenting data in a readable fashion, here
> are your results, tabulated:

I thought I tried my best to make it readable, but this one looks much better,
thanks, ;-)
>
> #
> # Sysbench throughput comparison of 3 different kernels at different
> # load levels, higher numbers are better:
> #
>
> .--------------------------------------|----------------------------------------------------------------.
> | NA/AVX vanilla-SMT [stddev%] |coresched-SMT [stddev%] +/- | no-SMT [stddev%] +/- |
> |--------------------------------------|----------------------------------------------------------------|
> | 1/1 508.5 [ 0.2% ] | 504.7 [ 1.1% ] 0.8% | 509.0 [ 0.2% ] 0.1% |
> | 2/2 1000.2 [ 1.4% ] | 1004.1 [ 1.6% ] 0.4% | 997.6 [ 1.2% ] 0.3% |
> | 4/4 1912.1 [ 1.0% ] | 1904.2 [ 1.1% ] 0.4% | 1914.9 [ 1.3% ] 0.1% |
> | 8/8 3753.5 [ 0.3% ] | 3748.2 [ 0.3% ] 0.1% | 3751.3 [ 0.4% ] 0.1% |
> | 16/16 7139.3 [ 2.4% ] | 7137.9 [ 1.8% ] 0.0% | 7049.2 [ 2.4% ] 1.3% |
> | 32/32 10899.0 [ 4.2% ] | 10780.3 [ 4.4% ] -1.1% | 10339.2 [ 9.6% ] -5.1% |
> | 64/64 15086.1 [ 11.5% ] | 14262.0 [ 8.2% ] -5.5% | 11168.7 [ 22.2% ] -26.0% |
> | 128/128 15371.9 [ 22.0% ] | 14675.8 [ 14.4% ] -4.5% | 10963.9 [ 18.5% ] -28.7% |
> | 256/256 15990.8 [ 22.0% ] | 12227.9 [ 10.3% ] -23.5% | 10469.9 [ 19.6% ] -34.5% |
> '--------------------------------------|----------------------------------------------------------------'
>
> One major thing that sticks out is that if we compare the stddev numbers
> to the +/- comparisons then it's pretty clear that the benchmarks are
> very noisy: in all but the last row stddev is actually higher than the
> measured effect.
>
> So what does 'stddev' mean here, exactly? The stddev of multipe runs,
> i.e. measured run-to-run variance? Or is it some internal metric of the
> benchmark?
>

The benchmark periodically reports intermediate statistics in one second,
the raw log looks like below:
[ 11s ] thds: 256 eps: 14346.72 lat (ms,95%): 44.17
[ 12s ] thds: 256 eps: 14328.45 lat (ms,95%): 44.17
[ 13s ] thds: 256 eps: 13773.06 lat (ms,95%): 43.39
[ 14s ] thds: 256 eps: 13752.31 lat (ms,95%): 43.39
[ 15s ] thds: 256 eps: 15362.79 lat (ms,95%): 43.39
[ 16s ] thds: 256 eps: 26580.65 lat (ms,95%): 35.59
[ 17s ] thds: 256 eps: 15011.78 lat (ms,95%): 36.89
[ 18s ] thds: 256 eps: 15025.78 lat (ms,95%): 39.65
[ 19s ] thds: 256 eps: 15350.87 lat (ms,95%): 39.65
[ 20s ] thds: 256 eps: 15491.70 lat (ms,95%): 36.89

I have a python script to parse eps(events per second) and lat(latency)
out, and compute the average and stddev. (And I can draw a curve locally).

It's noisy indeed when tasks number is greater than the CPU number.
It's probably caused by high frequent load balance and context switch.
Do you have any suggestions? Or any other information I can provide?

Thanks,
-Aubrey