Re: [PATCHv4 00/12] sched/fair: Migrate 'misfit' tasks on asymmetric capacity systems

From: Dietmar Eggemann
Date: Mon Jul 30 2018 - 10:30:34 EST


On 07/26/2018 07:14 PM, Valentin Schneider wrote:
Hi,

On 09/07/18 16:08, Morten Rasmussen wrote:
On Fri, Jul 06, 2018 at 12:18:27PM +0200, Vincent Guittot wrote:
Hi Morten,

On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:

[...]

With that out of the way, I did some lmbench runs:
lat_mem_rd 10 1024

With ASYM_PACKING, I still see lmbench tasks remaining on LITTLE CPUs while
bigs are free, because ASYM_PACKING only does explicit active balancing on
CPU_NEWLY_IDLE balancing - otherwise it'll rely on the nr_balance_failed counter.

However, that counter can be reset before it reaches the threshold at which
active balance is done, which can lead to huge upmigration delays (almost a
full second). I also see the same kind of issues on Juno r0.

This could be resolved by extending ASYM_PACKING active balancing to
non NEWLY_IDLE cases, but then we'd be thrashing everything. That's another
argument for basing upmigration on task load-tracking signals, as we can
determine which tasks need active balancing much faster than the
nr_balance_failed counter way while not active balancing the world.

The task layout of the test looks like n=85 always running tasks (each for ~ 1.25ms on big or little) and they all get created and run one after the other. So on a big cpu, their util values go from 512 to 1024 and from 223 to 446 on little cpu (Juno board). Latter thanks to Quentin's 'sched/fair: Fix util_avg of new tasks for asymmetric systems'.

root@juno:~# cat /sys/devices/system/cpu/cpu[01]/cpu_capacity
446
1024

(lat_mem_rd 10 1024) with ASYM_PACKING:

...
4.0 148.66 <-----
4.5 10.191
...
7.5 10.203
8.0 154.354 <-----

I ran the test affine to big, little and all cpus on tip/sched/core w/o ASYM_PACKING or Misfit:

cputype: big little all
cpumask: 0x06 0x39 0xff

mem size <---- latency ---->

0.00098 3.668 3.595 3.669
0.00195 3.668 3.594 3.594
0.00293 3.668 3.593 3.595
0.00391 3.669 3.596 3.595
...
3.75000 58.687 121.934 122.293
4.00000 57.054 121.771 120.489
4.50000 56.914 121.851 56.729
5.00000 57.347 121.777 56.975
5.50000 57.705 121.738 68.981
6.00000 57.935 121.728 57.542
6.50000 58.119 121.694 121.799
7.00000 58.194 121.502 57.844
7.50000 58.258 121.684 58.050
8.00000 58.293 121.725 58.030
9.00000 58.309 121.793 58.188
10.00000 58.561 122.252 122.078

There is no diff between big and little cpus with small memory sizes, just with the MB range.
If I look into the trace for 'all' it turns out that their are cases in which, even if the task only run for ~15% of the time on big, the latency value is printed as when it was running affine to big. So using the latency value as an indicator where the task was scheduled is IMHO not really possible.