Re: [PATCH v3] sched: Consolidate cpufreq updates
From: Qais Yousef
Date: Wed May 15 2024 - 06:41:54 EST
On 05/15/24 12:00, Dietmar Eggemann wrote:
> On 14/05/2024 00:09, Qais Yousef wrote:
> > On 05/13/24 14:43, Dietmar Eggemann wrote:
> >> On 12/05/2024 21:00, Qais Yousef wrote:
> >>
> >> [...]
> >>
> >>> @@ -4682,7 +4659,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>>
> >>> add_tg_cfs_propagate(cfs_rq, se->avg.load_sum);
> >>>
> >>> - cfs_rq_util_change(cfs_rq, 0);
> >>> + cpufreq_update_util(rq_of(cfs_rq), 0);
> >>
> >> Isn't this slighlty different now?
> >>
> >> before:
> >>
> >> if (&rq->cfs == cfs_rq) {
> >> cpufreq_update_util(rq, ....)
> >> }
> >>
> >> now:
> >>
> >> cpufreq_update_util(rq_of(cfs_rq), ...)
> >>
> >> You should get way more updates from attach/detach now.
> >
> > Yes, well spotted!
> >
> > Looking at the path more closely, I can see this is called from
> > enqueue_task_fair() path when a task migrates to new CPU. And when
> > attach_task_cfs_rq() which is called when we switch_to_fair(), which I already
> > cover in the policy change for the RUNNING task, or when
> > task_change_group_fair() which what I originally understood Vincent was
> > referring to. I moved the update to this function after the detach/attach
> > operations with better guards to avoid unnecessary update.
>
> Yeah, all !root cfs_rq attach or detach wouldn't change anything since
> the util_avg wouldn't have propagated to the root cfs_rq yet. So
> sugov_get_util() wouldn't see a difference.
>
> Yes, enqueue_entity() sets DO_ATTACH unconditionally.
>
> And dequeue_entity() sets DO_DETACH for a migrating (!wakeup migrating)
> task.
>
> For a wakeup migrating task we have remove_entity_load_avg() but this
> can't remove util_avg from the cfs_rq. This is deferred to
> update_cfs_rq_load_avg() in update_load_avg() or __update_blocked_fair().
>
> And switched_{to,from}_fair() (check_class_changed()) and
> task_change_group_fair() are the other 2 users of
> {attach,detach}_entity_load_avg(). (plus online_fair_sched_group() for
> attach).
>
> > I understood this will lead to big change and better apply immediately vs
> > wait for the next context switch. But I'll ask the question again, can we drop
> > this and defer to context switch?
>
> Hard to say really, probably we can. All benchmarks with score numbers
> will create plenty of context switches so you wont see a diff. And for
> more lighter testcases you would have to study the differences in trace
> files and reason about the implications of potentially kick CPUfreq a
> little bit later.
I lean to drop this and let the CPU state considered to be 'settled' on next
context switch.
But I'll wait to hear more opinions before I post a new version.
Thanks!