Re: [PATCH] sched/fair: fix case with reduced capacity CPU

From: Dietmar Eggemann
Date: Fri Jul 08 2022 - 06:11:12 EST


On 08/07/2022 09:17, Vincent Guittot wrote:
> On Thu, 7 Jul 2022 at 18:43, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 02/07/2022 06:52, Vincent Guittot wrote:

[...]

>>> The rework of the load balance has filterd the case when the CPU is

s/filterd/filtered

>>> classified to be fully busy but its capacity is reduced.
>>>
>>> Check if CPU's capacity is reduced while gathering load balance statistics
>>> and classify it group_misfit_task instead of group_fully_busy so we can

enum group_type {

...
/*
* SD_ASYM_CPUCAPACITY only: One task doesn't fit with CPU's capacity
* and must be migrated to a more powerful CPU.
*/
group_misfit_task
...

This `SD_ASYM_CPUCAPACITY only:` should be removed now.

[...]

>>> @@ -8798,6 +8798,19 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs
>>> return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
>>> }
>>>
>>> +static inline bool
>>> +sched_reduced_capacity(struct rq *rq, struct sched_domain *sd)

minor: Why not `static inline int check_reduced_capacity()` ? All
similar functions like check_cpu_capacity(), check_cpu_capacity() follow
this approach.

[...]

>>> @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>>> if (local_group)
>>> continue;
>>>
>>> - /* Check for a misfit task on the cpu */
>>> - if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
>>> - sgs->group_misfit_task_load < rq->misfit_task_load) {
>>> - sgs->group_misfit_task_load = rq->misfit_task_load;
>>> - *sg_status |= SG_OVERLOAD;
>>> + if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
>>> + /* Check for a misfit task on the cpu */
>>> + if (sgs->group_misfit_task_load < rq->misfit_task_load) {
>>> + sgs->group_misfit_task_load = rq->misfit_task_load;
>>> + *sg_status |= SG_OVERLOAD;
>>> + }
>>> + } else if ((env->idle != CPU_NOT_IDLE) &&
>>> + sched_reduced_capacity(rq, env->sd) &&
>>> + (sgs->group_misfit_task_load < load)) {
>>> + /* Check for a task running on a CPU with reduced capacity */
>>> + sgs->group_misfit_task_load = load;
>>> }

Minor:

This now has if(A)
if(B)
else if(C && B')

little bit harder to read.

[...]

>> I'm wondering why you've chosen that hybrid approach `group_misfit_task
>> -> migrate_load` and not `group_misfit_task -> migrate_misfit`.
>
> because, it means enabling the tracking of misfit task on rq at each
> task enqueue/dequeue/tick ... Then mistfit for heterogeneous platform
> checks max_cpu_capacity what we don't care and will trigger unwanted
> misfit migration for smp

Agreed, rq->misfit_task_load can't be used here.

>> It looks like this `rq->cfs.h_nr_running = 1` case almost (since we
>> check `busiest->nr_running > 1`) always ends up in the load_balance()
>> `if (!ld_moved)` condition and need_active_balance() can return 1 in
>> case `if ((env->idle != CPU_NOT_IDLE) && ...` condition. This leads to
>> active load_balance and this
>>
>> IMHO, the same you can achieve when you would stay with
>> `group_misfit_task -> migrate_misfit`.
>>
>> I think cpu_load(rq) can be used instead of `rq->misfit_task_load` in
>> the migrate_misfit case of find_busiest_queue() too.
>
> I don't think because you can have a higher cpu_load() but not being misfit

You're right, I forgot about this. Essentially we would need extra state
(e.g. in lb_env) to save which CPU in the busiest group has the misfit.