Re: [PATCH v2] sched: Consolidate cpufreq updates

From: Qais Yousef
Date: Mon May 06 2024 - 20:57:13 EST


On 05/06/24 12:05, Peter Zijlstra wrote:
> On Mon, May 06, 2024 at 12:31:03AM +0100, Qais Yousef wrote:
>
> > +static inline void update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
> > +{
> > +#ifdef CONFIG_CPU_FREQ
> > + unsigned int flags = 0;
> > +
> > +#ifdef CONFIG_SMP
> > + if (unlikely(current->sched_class == &stop_sched_class))
> > + return;
> > +#endif
>
> why do we care about the stop class? It shouldn't, in general, consume a
> lot of cycles.
>
> > +
> > + if (unlikely(current->sched_class == &idle_sched_class))
> > + return;
>
> And why do we care about idle? Specifically this test doesn't capture
> force-idle threads. Notably see is_idle_task().

It's just We don't want these tasks to 'pollute' cpufreq updates since they
shouldn't care or contribute to what frequency the CPU should be running at.

Yes I missed the is_idle_task() from the exclusion list - which can be
simplified as you suggest later.

>
> > +
> > + if (unlikely(task_has_idle_policy(current)))
> > + return;
> > +
> > + if (likely(fair_policy(current->policy))) {
> > +
> > + if (unlikely(current->in_iowait)) {
> > + flags |= SCHED_CPUFREQ_IOWAIT | SCHED_CPUFREQ_FORCE_UPDATE;
> > + goto force_update;
> > + }
> > +
> > +#ifdef CONFIG_SMP
> > + /*
> > + * Allow cpufreq updates once for every update_load_avg() decay.
> > + */
> > + if (unlikely(rq->cfs.decayed)) {
> > + rq->cfs.decayed = false;
> > + goto force_update;
> > + }
> > +#endif
> > + return;
> > + }
> > +
> > + /*
> > + * RT and DL should always send a freq update. But we can do some
> > + * simple checks to avoid it when we know it's not necessary.
> > + */
> > + if (rt_task(current) && rt_task(prev)) {
>
> IIRC dl tasks also match rt_task, so your else clause might not work the
> way you've intended.
>
> > +#ifdef CONFIG_UCLAMP_TASK
> > + unsigned long curr_uclamp_min = uclamp_eff_value(current, UCLAMP_MIN);
> > + unsigned long prev_uclamp_min = uclamp_eff_value(prev, UCLAMP_MIN);
> > +
> > + if (curr_uclamp_min == prev_uclamp_min)
> > +#endif
> > + return;
> > + } else if (dl_task(current) && current->dl.flags & SCHED_FLAG_SUGOV) {
>
> Notably DL tasks also match rt_task(), so I don't think this clause

Hmm yes. dl priority is negative and rt_task() will capture this. Shouldn't we
fix the function? Can send a separate patch.

static inline int rt_task(struct task_struct *p)
{
return rt_prio(p->prio) && !dl_prio();
}

> exactly does as you expect. Also, isn't the flags check sufficient on
> it's own?

I considered this, but opted to keep the dl_task() reservedly assuming access
to dl structure should only be considered valid for dl tasks. It seemed safer
to me against potential future changes to the access pattern.

Happy to drop it if this is too reserved.

>
> > + /* Ignore sugov kthreads, they're responding to our requests */
> > + return;
> > + }
> > +
> > + flags |= SCHED_CPUFREQ_FORCE_UPDATE;
> > +
> > +force_update:
> > + cpufreq_update_util(rq, flags);
> > +#endif
> > +}
>
> But over-all the thing seems very messy, mixing sched_class, policy and
> prio based selection methods.

Yeah, I started with basic conditions and started walking my way on what things
should be excluded. We don't have fair_task() so used fair_policy() and out of
habit continued to use rt/dl_task() without realizing the caveat you
highlighted.

>
> Can't this be cleaned up somewhat?
>
>
> Notably, if you structure it something like so:
>
> if (fair_policy(current)) {
> ...
> return;
> }
>
> if (rt_policy(current)) {
> if (dl_task(current) && current->dl.flags & SCHED_FLAG_SUGOV)
> return;
> if (rt_policy(prev) && uclamps_match(current, prev))
> return;
> ...
> return;
> }
>
> /* everybody else gets nothing */
> return;
>
> You get a lot less branches in the common paths, no?

Yes. How about this? Since stopper class appears as RT, we should still check
for this class specifically.

Thanks!

--->8---

static inline void update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
{
#ifdef CONFIG_CPU_FREQ
if (likely(fair_policy(current->policy))) {

if (unlikely(current->in_iowait)) {
cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT | SCHED_CPUFREQ_FORCE_UPDATE);
return;
}

#ifdef CONFIG_SMP
/*
* Allow cpufreq updates once for every update_load_avg() decay.
*/
if (unlikely(rq->cfs.decayed)) {
rq->cfs.decayed = false;
cpufreq_update_util(rq, 0);
return;
}
#endif
return;
}

/*
* RT and DL should always send a freq update. But we can do some
* simple checks to avoid it when we know it's not necessary.
*/
if (task_is_realtime(current)) {
if (dl_task(current) && current->dl.flags & SCHED_FLAG_SUGOV) {
/* Ignore sugov kthreads, they're responding to our requests */
return;
}

if (rt_task(current) && rt_task(prev)) {
#ifdef CONFIG_UCLAMP_TASK
unsigned long curr_uclamp_min = uclamp_eff_value(current, UCLAMP_MIN);
unsigned long prev_uclamp_min = uclamp_eff_value(prev, UCLAMP_MIN);

if (curr_uclamp_min == prev_uclamp_min)
#endif
return;
}

#ifdef CONFIG_SMP
if (unlikely(current->sched_class == &stop_sched_class))
return;
#endif

cpufreq_update_util(rq, SCHED_CPUFREQ_FORCE_UPDATE);
return;
}

/* Everything else shouldn't trigger a cpufreq update */
return;
#endif
}