Re: [PATCH v2] sched: Consolidate cpufreq updates

From: Peter Zijlstra
Date: Mon May 06 2024 - 06:05:36 EST


On Mon, May 06, 2024 at 12:31:03AM +0100, Qais Yousef wrote:

> +static inline void update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
> +{
> +#ifdef CONFIG_CPU_FREQ
> + unsigned int flags = 0;
> +
> +#ifdef CONFIG_SMP
> + if (unlikely(current->sched_class == &stop_sched_class))
> + return;
> +#endif

why do we care about the stop class? It shouldn't, in general, consume a
lot of cycles.

> +
> + if (unlikely(current->sched_class == &idle_sched_class))
> + return;

And why do we care about idle? Specifically this test doesn't capture
force-idle threads. Notably see is_idle_task().

> +
> + if (unlikely(task_has_idle_policy(current)))
> + return;
> +
> + if (likely(fair_policy(current->policy))) {
> +
> + if (unlikely(current->in_iowait)) {
> + flags |= SCHED_CPUFREQ_IOWAIT | SCHED_CPUFREQ_FORCE_UPDATE;
> + goto force_update;
> + }
> +
> +#ifdef CONFIG_SMP
> + /*
> + * Allow cpufreq updates once for every update_load_avg() decay.
> + */
> + if (unlikely(rq->cfs.decayed)) {
> + rq->cfs.decayed = false;
> + goto force_update;
> + }
> +#endif
> + return;
> + }
> +
> + /*
> + * RT and DL should always send a freq update. But we can do some
> + * simple checks to avoid it when we know it's not necessary.
> + */
> + if (rt_task(current) && rt_task(prev)) {

IIRC dl tasks also match rt_task, so your else clause might not work the
way you've intended.

> +#ifdef CONFIG_UCLAMP_TASK
> + unsigned long curr_uclamp_min = uclamp_eff_value(current, UCLAMP_MIN);
> + unsigned long prev_uclamp_min = uclamp_eff_value(prev, UCLAMP_MIN);
> +
> + if (curr_uclamp_min == prev_uclamp_min)
> +#endif
> + return;
> + } else if (dl_task(current) && current->dl.flags & SCHED_FLAG_SUGOV) {

Notably DL tasks also match rt_task(), so I don't think this clause
exactly does as you expect. Also, isn't the flags check sufficient on
it's own?

> + /* Ignore sugov kthreads, they're responding to our requests */
> + return;
> + }
> +
> + flags |= SCHED_CPUFREQ_FORCE_UPDATE;
> +
> +force_update:
> + cpufreq_update_util(rq, flags);
> +#endif
> +}

But over-all the thing seems very messy, mixing sched_class, policy and
prio based selection methods.

Can't this be cleaned up somewhat?


Notably, if you structure it something like so:

if (fair_policy(current)) {
...
return;
}

if (rt_policy(current)) {
if (dl_task(current) && current->dl.flags & SCHED_FLAG_SUGOV)
return;
if (rt_policy(prev) && uclamps_match(current, prev))
return;
...
return;
}

/* everybody else gets nothing */
return;

You get a lot less branches in the common paths, no?