Re: [PATCH v2 1/2] sched/schedutil: rework performance estimation

From: Vincent Guittot
Date: Tue Oct 31 2023 - 05:49:03 EST


Hi Lukasz,

On Mon, 30 Oct 2023 at 18:45, Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
> Hi Vincent,
>
> On 10/26/23 18:09, Vincent Guittot wrote:
> > The current method to take into account uclamp hints when estimating the
> > target frequency can end into situation where the selected target
> > frequency is finally higher than uclamp hints whereas there are no real
> > needs. Such cases mainly happen because we are currently mixing the
> > traditional scheduler utilization signal with the uclamp performance
> > hints. By adding these 2 metrics, we loose an important information when
> > it comes to select the target frequency and we have to make some
> > assumptions which can't fit all cases.
> >
> > Rework the interface between the scheduler and schedutil governor in order
> > to propagate all information down to the cpufreq governor.
> >
> > effective_cpu_util() interface changes and now returns the actual
> > utilization of the CPU with 2 optional inputs:
> > - The minimum performance for this CPU; typically the capacity to handle
> > the deadline task and the interrupt pressure. But also uclamp_min
> > request when available.
> > - The maximum targeting performance for this CPU which reflects the
> > maximum level that we would like to not exceed. By default it will be
> > the CPU capacity but can be reduced because of some performance hints
> > set with uclamp. The value can be lower than actual utilization and/or
> > min performance level.
>
> You have probably missed my question in the last v1 patch set.

Yes, sorry

>
> The description above needs a bit of clarification, since looking at the
> patches some dark corners are introduced IMO:
>
> Currently, we have a less aggressive power saving policy than this
> proposal.
>
> The questions:
> What if the PD has 4 CPUs, the max util found is 500 and is from a CPU
> w/ uclamp_max, but there is another CPU with normal utilization 499?
> What should be the final frequency for that PD?

We now follow the same sequence everywhere which can be summarized by:

for each cpu sharing the same frequency domain:
util = cpu_util(cpu)
eff_util = effective_cpu_util(util, &min, &max)
eff_util = sugov_effective_cpu_perf(eff_util, min, max) which
applies the dvfs headroom if needed
max_util = max(max_util, eff_util);

EAS anticipates the impact of the waking task on utilization and max
but the end result is the same as above once the task is enqueued so I
didn't show it for simplicity

Coming back to your example
CPU0 has uclamp_max = 500 and an actual utilization above 500. Its
eff_util will be 500
CPU1 doesn't have uclamp_max constraint and an actual utilization of
499 which will be increase with dvfs headroom to 623 in
sugov_effective_cpu_perf()

The final max util will be 623

With the current implementation we apply the dvfs headroom to the
final max_util (which is the CPU0 with uclamp_max == 500) whereas we
now apply the dvfs headroom on each CPU inside
sugov_effective_cpu_perf()

The main difference is that if CPU1 has an actual utilization of 400,
the max_util of the frequency domain will be 500 whereas it is 625
after applying dvfs headroom with current implementation

>
> In current design, where we care more about 'delivered performance
> to the tasks' than power saving, the +20% would be applied for the
> frequency. Therefore if that CPU with 499 util doesn't have uclamp_max,
> it would get a decent amount of idle time for its tasks (to compensate
> some workload variation).

CPU1 with 499 still gets its 25% margin or I missed something in your example ?

Vincent

>
> Regards,
> Lukasz