Re: [PATCH v4] sched/fair: unlink misfit task from cpu overutilized

From: Dietmar Eggemann
Date: Thu Jan 26 2023 - 06:42:19 EST


On 19/01/2023 17:42, Vincent Guittot wrote:
> By taking into account uclamp_min, the 1:1 relation between task misfit
> and cpu overutilized is no more true as a task with a small util_avg may
> not fit a high capacity cpu because of uclamp_min constraint.
>
> Add a new state in util_fits_cpu() to reflect the case that task would fit
> a CPU except for the uclamp_min hint which is a performance requirement.
>
> Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> can use this new value to take additional action to select the best CPU
> that doesn't match uclamp_min hint.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> ---
>
> Change since v3:
> - Keep current condition for uclamp_max_fits in util_fits_cpu()
> - Update some comments

We had already this discussion whether this patch can also remove
Capacity Inversion (CapInv).

After studying the code again, I'm not so sure anymore.

This patch:

(1) adds a dedicated return value (-1) to util_fits_cpu() when:

`util fits 80% capacity_of() && util < uclamp_min && uclamp_min >
capacity_orig_thermal (region c)`

(2) Enhancements to the CPU selection in sic() and feec() to cater for
this new return value.

IMHO this doesn't make the intention of CapInv in util_fits_cpu()
obsolete, which is using:

in CapInv:

capacity_orig = capacity_orig_of() - thermal_load_avg
capacity_orig_thermal = capacity_orig_of() - thermal_load_avg

not in CapInv:

capacity_orig = capacity_orig_of()
capacity_orig_thermal = capacity_orig_of() - th_pressure

Maybe I still miss a bit of the story?

v3 hints to removing the bits in the next version:

https://lkml.kernel.org/r/20230115001906.v7uq4ddodrbvye7d@airbuntu

> kernel/sched/fair.c | 105 ++++++++++++++++++++++++++++++++++----------
> 1 file changed, 82 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d4db72f8f84e..54e14da53274 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4561,8 +4561,8 @@ static inline int util_fits_cpu(unsigned long util,
> * handle the case uclamp_min > uclamp_max.
> */
> uclamp_min = min(uclamp_min, uclamp_max);
> - if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
> - fits = fits && (uclamp_min <= capacity_orig_thermal);
> + if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal))
> + return -1;

Or does the definition 'return -1 if util fits but uclamp doesn't' make
the distinction between capacity_orig and capacity_orig_thermal obsolete
and so CapInv?

[...]

> static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
> @@ -6138,6 +6142,7 @@ static inline bool cpu_overutilized(int cpu)
> unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
> unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
>
> + /* Return true only if the utilization doesn't fits CPU's capacity */

small typo: s/doesn't fits/doesn't fit

[...]

> @@ -6946,12 +6952,28 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>
> if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
> continue;
> - if (util_fits_cpu(task_util, util_min, util_max, cpu))
> +
> + fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> +
> + /* This CPU fits with all requirements */
> + if (fits > 0)
> return cpu;
> + /*
> + * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> + * Look for the CPU with best capacity.
> + */
> + else if (fits < 0)
> + cpu_cap = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));

Still don't grasp why we use thermal_load_avg() here? Looks to me that
this would only match the CapInv case in util_fits_cpu().

[...]