Re: [RFC PATCH v2 00/17] Core scheduling v2

From: Ingo Molnar
Date: Sun Apr 28 2019 - 05:33:15 EST



* Aubrey Li <aubrey.intel@xxxxxxxxx> wrote:

> > But what we are really interested in are throughput numbers under
> > these three kernel variants, right?
>
> These are sysbench events per second number, higher is better.
>
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 1/1 508.5( 0.2%) 504.7( 1.1%) -0.8% 509.0( 0.2%) 0.1%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 2/2 1000.2( 1.4%) 1004.1( 1.6%) 0.4% 997.6( 1.2%) -0.3%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 4/4 1912.1( 1.0%) 1904.2( 1.1%) -0.4% 1914.9( 1.3%) 0.1%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 8/8 3753.5( 0.3%) 3748.2( 0.3%) -0.1% 3751.3( 0.4%) -0.1%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 16/16 7139.3( 2.4%) 7137.9( 1.8%) -0.0% 7049.2( 2.4%) -1.3%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 32/32 10899.0( 4.2%) 10780.3( 4.4%) -1.1% 10339.2( 9.6%) -5.1%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 64/64 15086.1(11.5%) 14262.0( 8.2%) -5.5% 11168.7(22.2%) -26.0%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 128/128 15371.9(22.0%) 14675.8(14.4%) -4.5% 10963.9(18.5%) -28.7%
> NA/AVX baseline(std%) coresched(std%) +/- nosmt(std%) +/-
> 256/256 15990.8(22.0%) 12227.9(10.3%) -23.5% 10469.9(19.6%) -34.5%

So because I'm a big fan of presenting data in a readable fashion, here
are your results, tabulated:

#
# Sysbench throughput comparison of 3 different kernels at different
# load levels, higher numbers are better:
#

.--------------------------------------|----------------------------------------------------------------.
| NA/AVX vanilla-SMT [stddev%] |coresched-SMT [stddev%] +/- | no-SMT [stddev%] +/- |
|--------------------------------------|----------------------------------------------------------------|
| 1/1 508.5 [ 0.2% ] | 504.7 [ 1.1% ] 0.8% | 509.0 [ 0.2% ] 0.1% |
| 2/2 1000.2 [ 1.4% ] | 1004.1 [ 1.6% ] 0.4% | 997.6 [ 1.2% ] 0.3% |
| 4/4 1912.1 [ 1.0% ] | 1904.2 [ 1.1% ] 0.4% | 1914.9 [ 1.3% ] 0.1% |
| 8/8 3753.5 [ 0.3% ] | 3748.2 [ 0.3% ] 0.1% | 3751.3 [ 0.4% ] 0.1% |
| 16/16 7139.3 [ 2.4% ] | 7137.9 [ 1.8% ] 0.0% | 7049.2 [ 2.4% ] 1.3% |
| 32/32 10899.0 [ 4.2% ] | 10780.3 [ 4.4% ] -1.1% | 10339.2 [ 9.6% ] -5.1% |
| 64/64 15086.1 [ 11.5% ] | 14262.0 [ 8.2% ] -5.5% | 11168.7 [ 22.2% ] -26.0% |
| 128/128 15371.9 [ 22.0% ] | 14675.8 [ 14.4% ] -4.5% | 10963.9 [ 18.5% ] -28.7% |
| 256/256 15990.8 [ 22.0% ] | 12227.9 [ 10.3% ] -23.5% | 10469.9 [ 19.6% ] -34.5% |
'--------------------------------------|----------------------------------------------------------------'

One major thing that sticks out is that if we compare the stddev numbers
to the +/- comparisons then it's pretty clear that the benchmarks are
very noisy: in all but the last row stddev is actually higher than the
measured effect.

So what does 'stddev' mean here, exactly? The stddev of multipe runs,
i.e. measured run-to-run variance? Or is it some internal metric of the
benchmark?

Thanks,

Ingo