Re: [PATCH v4 0/4] sched/fair: Capacity aware wakeup rework

From: Quentin Perret
Date: Fri Feb 07 2020 - 05:42:52 EST


On Thursday 06 Feb 2020 at 19:19:53 (+0000), Valentin Schneider wrote:
> Pixel3 (DynamIQ)
> ++++++++++++++++
>
> Ideally I would have used a DB845C but had a few issues with mine, so I
> went with a mainline-ish Pixel3 instead [1]. It's still the same SoC under
> the hood (Snapdragon 845), which has 4 bigs and 4 LITTLEs:
>
> +-------------------------------+
> | L3 |
> +---+---+---+---+---+---+---+---+
> | L2| L2| L2| L2| L2| L2| L2| L2|
> +---+---+---+---+---+---+---+---+
> | L | L | L | L | B | B | B | B |
> +---+---+---+---+---+---+---+---+
>
> Default topology (single MC domain)
> -----------------------------------
>
> 100 iterations of 'hackbench -l 200'
>
> | | -PATCH | +PATCH | DELTA (%) |
> |------+----------+----------+-----------|
> | mean | 1.131360 | 1.102560 | -2.546 |
> | std | 0.116322 | 0.101999 | -12.313 |
> | min | 0.935000 | 0.935000 | +0.000 |
> | 50% | 1.099000 | 1.097500 | -0.136 |
> | 75% | 1.211250 | 1.157750 | -4.417 |
> | 99% | 1.401020 | 1.338210 | -4.483 |
> | max | 1.502000 | 1.359000 | -9.521 |
>
> 100 iterations of 'sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=8 run':
>
> | | -PATCH | +PATCH | DELTA (%) |
> |------+-------------+-------------+-----------|
> | mean | 7108.310000 | 8731.610000 | +22.837 |
> | std | 199.431854 | 206.826912 | +3.708 |
> | min | 6655.000000 | 8251.000000 | +23.982 |
> | 50% | 7107.500000 | 8705.000000 | +22.476 |
> | 75% | 7255.500000 | 8868.250000 | +22.228 |
> | 99% | 7539.540000 | 9155.520000 | +21.433 |
> | max | 7593.000000 | 9207.000000 | +21.256 |
>
> Phantom domains (MC + DIE)
> --------------------------
>
> This is mostly included for the sake of completeness.
>
> 100 iterations of 'sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=8 run':
>
> | | -PATCH | +PATCH | DELTA (%) |
> |------+-------------+-------------+-----------|
> | mean | 7317.940000 | 9328.470000 | +27.474 |
> | std | 460.372682 | 181.528886 | -60.569 |
> | min | 5888.000000 | 8832.000000 | +50.000 |
> | 50% | 7271.000000 | 9348.000000 | +28.566 |
> | 75% | 7497.500000 | 9477.250000 | +26.405 |
> | 99% | 8464.390000 | 9634.160000 | +13.820 |
> | max | 8602.000000 | 9650.000000 | +12.183 |


So, it feels like the most interesting test would be

'baseline w/ phantom domains' vs 'this patch w/o phantom domains'

right ? The 'baseline w/o phantom domains' case is arguably borked today,
so it isn't that interesting (even though it performs well for the
particular workload you choose here, as expected, but I guess you might
see issues in others).

So, IIUC, based on your results above, that would be:

| | base+PD | patch+noPD | DELTA (%) |
|------+-------------+-------------+-----------|
| mean | 7317.940000 | 8731.610000 | +19.318 |
| std | 460.372682 | 206.826912 | -55.074 |
| min | 5888.000000 | 8251.000000 | +40.132 |
| 50% | 7271.000000 | 8705.000000 | +19.722 |
| 75% | 7497.500000 | 8868.250000 | +18.283 |
| 99% | 8464.390000 | 9155.520000 | +8.165 |
| max | 8602.000000 | 9207.000000 | +7.033 |

Is that correct ?

If so, this patch series is still a very big win, and I'm all for
getting it merged. But I find it interesting that the results aren't as
good as having this patch _and_ phantom domains at the same time ...

Any idea why having phantom domains helps ? select_idle_capacity()
should behave the same w/ or w/o phantom domains given that you use
sd_asym_cpucapacity directly. I'm guessing something else has an impact
here ? LB / misfit behaving a bit differently perhaps ?

Thanks,
Quentin