Re: [PATCH v4 1/4] sched/fair: Add asymmetric CPU capacity wakeup scan

From: Quentin Perret
Date: Fri Feb 07 2020 - 06:01:18 EST


On Thursday 06 Feb 2020 at 19:19:54 (+0000), Valentin Schneider wrote:
> From: Morten Rasmussen <morten.rasmussen@xxxxxxx>
>
> Issue
> =====
>
> On asymmetric CPU capacity topologies, we currently rely on wake_cap() to
> drive select_task_rq_fair() towards either
> - its slow-path (find_idlest_cpu()) if either the previous or
> current (waking) CPU has too little capacity for the waking task
> - its fast-path (select_idle_sibling()) otherwise
>
> Commit 3273163c6775 ("sched/fair: Let asymmetric CPU configurations balance
> at wake-up") points out that this relies on the assumption that "[...]the
> CPU capacities within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are
> homogeneous".
>
> This assumption no longer holds on newer generations of big.LITTLE
> systems (DynamIQ), which can accommodate CPUs of different compute capacity
> within a single LLC domain. To hopefully paint a better picture, a regular
> big.LITTLE topology would look like this:
>
> +---------+ +---------+
> | L2 | | L2 |
> +----+----+ +----+----+
> |CPU0|CPU1| |CPU2|CPU3|
> +----+----+ +----+----+
> ^^^ ^^^
> LITTLEs bigs
>
> which would result in the following scheduler topology:
>
> DIE [ ] <- sd_asym_cpucapacity
> MC [ ] [ ] <- sd_llc
> 0 1 2 3
>
> Conversely, a DynamIQ topology could look like:
>
> +-------------------+
> | L3 |
> +----+----+----+----+
> | L2 | L2 | L2 | L2 |
> +----+----+----+----+
> |CPU0|CPU1|CPU2|CPU3|
> +----+----+----+----+
> ^^^^^ ^^^^^
> LITTLEs bigs
>
> which would result in the following scheduler topology:
>
> MC [ ] <- sd_llc, sd_asym_cpucapacity
> 0 1 2 3
>
> What this means is that, on DynamIQ systems, we could pass the wake_cap()
> test (IOW presume the waking task fits on the CPU capacities of some LLC
> domain), thus go through select_idle_sibling().
> This function operates on an LLC domain, which here spans both bigs and
> LITTLEs, so it could very well pick a CPU of too small capacity for the
> task, despite there being fitting idle CPUs - it very much depends on the
> CPU iteration order, on which we have absolutely no guarantees
> capacity-wise.
>
> Implementation
> ==============
>
> Introduce yet another select_idle_sibling() helper function that takes CPU
> capacity into account. The policy is to pick the first idle CPU which is
> big enough for the task (task_util * margin < cpu_capacity). If no
> idle CPU is big enough, we pick the idle one with the highest capacity.
>
> Unlike other select_idle_sibling() helpers, this one operates on the
> sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all
> known CPU capacities in the system. As such, this will work for both
> "legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for
> newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain).
>
> Note that this limits the scope of select_idle_sibling() to
> select_idle_capacity() for asymmetric CPU capacity systems - the LLC domain
> will not be scanned, and no further heuristic will be applied.
>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@xxxxxxx>
> Co-developed-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>

Reviewed-by: Quentin Perret <qperret@xxxxxxxxxx>

Thanks,
Quentin