Re: [PATCH v2] sched/fair: unlink misfit task from cpu overutilized

From: Qais Yousef
Date: Sat Jan 14 2023 - 13:21:21 EST


On 01/13/23 09:53, Vincent Guittot wrote:
> On Thu, 12 Jan 2023 at 15:26, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > Hi Vincent
> >
> > On 12/28/22 17:54, Vincent Guittot wrote:
> > > By taking into account uclamp_min, the 1:1 relation between task misfit
> > > and cpu overutilized is no more true as a task with a small util_avg of
> > > may not may not fit a high capacity cpu because of uclamp_min constraint.
> >
> > Wouldn't it be better to split this into two patches
> >
> > * Unlink/Decouple misfit ...
> > * Unlink/Decouple util_fits_cpu from HMP
> >
> > ?
>
> I'm afraid that git bisect could then raise a false positive between
> the 2 commits

They should be independent, no? Anyway, I don't feel that strongly about it but
as a minor nit the commit subject could be better.

>
> >
> > >
> > > Add a new state in util_fits_cpu() to reflect the case that task would fit
> > > a CPU except for the uclamp_min hint which is a performance requirement.
> > >
> > > Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> > > can use this new value to take additional action to select the best CPU
> > > that doesn't match uclamp_min hint.
> >
> > This part has nothing to do with the commit subject. I think it's better to
> > split the patches if it's not too much work for you.
> >
> > >
> > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > ---
> > >
> > > Change since v1:
> > > - fix some wrong conditions
> > > - take into account more cases
> > >
> > > kernel/sched/fair.c | 99 +++++++++++++++++++++++++++++++++------------
> > > 1 file changed, 74 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 1649e7d71d24..57077f0a897e 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -4582,8 +4582,7 @@ static inline int util_fits_cpu(unsigned long util,
> > > * 2. The system is being saturated when we're operating near
> > > * max capacity, it doesn't make sense to block overutilized.
> > > */
> > > - uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > > - uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > > + uclamp_max_fits = (uclamp_max <= capacity_orig) || (capacity_orig == SCHED_CAPACITY_SCALE);
> > > fits = fits || uclamp_max_fits;
> > >
> > > /*
> > > @@ -4618,8 +4617,8 @@ static inline int util_fits_cpu(unsigned long util,
> > > * handle the case uclamp_min > uclamp_max.
> > > */
> > > uclamp_min = min(uclamp_min, uclamp_max);
> > > - if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
> > > - fits = fits && (uclamp_min <= capacity_orig_thermal);
> > > + if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal))
> > > + return -1;
> > >
> > > return fits;
> > > }
> > > @@ -4629,7 +4628,7 @@ static inline int task_fits_cpu(struct task_struct *p, int cpu)
> > > unsigned long uclamp_min = uclamp_eff_value(p, UCLAMP_MIN);
> > > unsigned long uclamp_max = uclamp_eff_value(p, UCLAMP_MAX);
> > > unsigned long util = task_util_est(p);
> > > - return util_fits_cpu(util, uclamp_min, uclamp_max, cpu);
> > > + return (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0);
> >
> > So the big difference between your approach and my approach is that
> > task_fits_cpu() and asym_fits_cpu() now are very strict in regards to thermal
> > pressure since with your approach we delegate the smartness to the caller.
> >
> > Should we add a comment for these 2 users to make it obvious we intentionally
> > ignore the '-1' value and why it is okay?
>
> I can probably add something saying that a positive value (1 in this
> case) is the only one that says that a task fits to a cpu. Other
> returned values imply that either the utilization or the uclamp
> constraints are not meet

Sounds good to me. So at least whoever looks at this later and doesn't know
the history will at least try to dig more before using it as-is and assume
wonders.

> >
> > I'm not sure I can write a reasonable rationale myself. I'm actually worried
> > this might subtly break decisions made by select_idle_capacity() or feec() when
> > doing the LB.
> >
> > Have you considered this?
>
> Yes, that why i have keep the changes in 1 patch

Okay I think I see what you're on about now.

[...]

> > > + /*
> > > + * Both fit for the task but best energy cpu has lower
> > > + * energy impact.
> > > + */
> > > + if ((max_fits > 0) &&
> >
> > Shouldn't this be
> >
> > if ((max_fits > 0) && (max_fits == best_fits) &&
> > ?
>
> I will use the below which match better the comment and the fact that
> both fit for the task:
>
> + if ((max_fits > 0) && (best_fits > 0) &&
>

Fine by me.

> >
> > We should update best_delta unconditionally first time we hit max_fits = 1, no?
> >
> > I think it's worth extending the comment with something along the lines of
> >
> > * ... except for the first time max_fits becomes 1
> > * then we must update best_delta unconditionally
>
> With the new condition above this is not needed anymore

+1

>
> >
> > > + (cur_delta >= best_delta))
> > > + continue;
> > > +
> > > + best_delta = cur_delta;
> > > + best_energy_cpu = max_spare_cap_cpu;
> > > + best_fits = max_fits;
> > > + best_thermal_cap = cpu_thermal_cap;
> > > }
> > > }
> > > rcu_read_unlock();
> > >
> > > - if (best_delta < prev_delta)
> > > + if ((best_fits > prev_fits) ||
> > > + ((best_fits > 0) && (best_delta < prev_delta)) ||
> > > + ((best_fits < 0) && (best_thermal_cap > prev_thermal_cap)))
> > > target = best_energy_cpu;
> >
> > Overall I think the approach is sound. I tested it on my pinebook pro and
> > couldn't catch obvious breakage at least.
> >
> > I am still worried though about spilling the knowledge outside of
> > util_fits_cpu() is creating extra complexity in the callers and potentially
> > more fragility when these callers evolve overtime e.g:
> > task_fits_cpu()/asym_fits_cpu() gain a new user that must actually care about
> > the -1 return value.
>
> ask_fits_cpu()/asym_fits_cpu() remain simple booleans that return true
> if the task fits the cpu in regards to all requirements. If a new user
> wants to make smarter decisions like select_idle_capacity() or feec(),
> it will have to use util_fits_cpu(). Both handle the case where
> uclamp_min is not met differently.

I think with that comment added we can hope this will promote new users to
think more.

>
> >
> > I think we can still optimize the capacity inversion logic to use no loops
> > without having to spill the knowledge to the users/callers of util_fits_cpu(),
> > no?
>
> TBH, I don't know how because we are not always comparing the same
> things depending of the reason it doesn't fit

Your patch will do a more comprehensive search, true. It will try harder to
find a best fit. I do have a bit of a bias that some severe thermal conditions
are indication of a bigger problem to try harder to honour uclamp_min (warrant
the additional complexity). But I thought more about it and I think it actually
might be worth while. Xuewen had a system where medium's capacity was 900 and
bigs could easily be inverted with some thermal pressure on mediums too. So
this harder search can help these systems to keep these tasks on mediums. e.g:
if uclamp_min was 900 and big is inverted and medium is under pressure my
approach would have returned false for all CPUs then; while yours will keep it
on medium cpus.

>
> >
> > That said except for the few comments I had this LGTM anyway. Thanks for your
> > effort!
>
> Thanks
>
> I'm going to prepare v2

Thanks!


--
Qais Yousef