Re: [PATCH 3/3] sched/fair: schedutil: explicit update only when required

From: Joel Fernandes
Date: Thu May 17 2018 - 10:20:20 EST

Next message: Koo, Anthony: "RE: linux-next: Signed-off-by missing for commits in the drm tree"
Previous message: Jason Gunthorpe: "Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08"
In reply to: Vincent Guittot: "Re: [PATCH 3/3] sched/fair: schedutil: explicit update only when required"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Patrick,

On Mon, May 14, 2018 at 05:32:06PM +0100, Patrick Bellasi wrote:
> On 12-May 23:25, Joel Fernandes wrote:
> > On Sat, May 12, 2018 at 11:04:43PM -0700, Joel Fernandes wrote:
> > > On Thu, May 10, 2018 at 04:05:53PM +0100, Patrick Bellasi wrote:
> > > > Schedutil updates for FAIR tasks are triggered implicitly each time a
> > > > cfs_rq's utilization is updated via cfs_rq_util_change(), currently
> > > > called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has
> > > > changed, and {attach,detach}_entity_load_avg().
> > > >
> > > > This design is based on the idea that "we should callback schedutil
> > > > frequently enough" to properly update the CPU frequency at every
> > > > utilization change. However, such an integration strategy has also
> > > > some downsides:
> > >
> > > I agree making the call explicit would make schedutil integration easier so
> > > that's really awesome. However I also fear that if some path in the fair
> > > class in the future changes the utilization but forgets to update schedutil
> > > explicitly (because they forgot to call the explicit public API) then the
> > > schedutil update wouldn't go through. In this case the previous design of
> > > doing the schedutil update in the wrapper kind of was a nice to have
>
> I cannot see right now other possible future paths where we can
> actually change the utilization signal without considering that,
> eventually, we should call an existing API to update schedutil if it
> makes sense.
>
> What I can see more likely instead, also because it already happened a
> couple of time, is that because of code changes in fair.c we end up
> calling (implicitly) schedutil with a wrong utilization value.
>
> To note this kind of broken dependency it has already been more
> difficult than possibly noticing an update of the utilization without
> a corresponding explicit call of the public API.

Ok, we are in agreement this is a good thing to do :)

> > > > @@ -5397,9 +5366,27 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > update_cfs_group(se);
> > > > }
> > > >
> > > > - if (!se)
> > > > + /* The task is visible from the root cfs_rq */
> > > > + if (!se) {
> > > > + unsigned int flags = 0;
> > > > +
> > > > add_nr_running(rq, 1);
> > > >
> > > > + if (p->in_iowait)
> > > > + flags |= SCHED_CPUFREQ_IOWAIT;
> > > > +
> > > > + /*
> > > > + * !last_update_time means we've passed through
> > > > + * migrate_task_rq_fair() indicating we migrated.
> > > > + *
> > > > + * IOW we're enqueueing a task on a new CPU.
> > > > + */
> > > > + if (!p->se.avg.last_update_time)
> > > > + flags |= SCHED_CPUFREQ_MIGRATION;
> > > > +
> > > > + cpufreq_update_util(rq, flags);
> > > > + }
> > > > +
> > > > hrtick_update(rq);
> > > > }
> > > >
> > > > @@ -5456,10 +5443,12 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > update_cfs_group(se);
> > > > }
> > > >
> > > > + /* The task is no more visible from the root cfs_rq */
> > > > if (!se)
> > > > sub_nr_running(rq, 1);
> > > >
> > > > util_est_dequeue(&rq->cfs, p, task_sleep);
> > > > + cpufreq_update_util(rq, 0);
> > >
> > > One question about this change. In enqueue, throttle and unthrottle - you are
> > > conditionally calling cpufreq_update_util incase the task was
> > > visible/not-visible in the hierarchy.
> > >
> > > But in dequeue you're unconditionally calling it. Seems a bit inconsistent.
> > > Is this because of util_est or something? Could you add a comment here
> > > explaining why this is so?
> >
> > The big question I have is incase se != NULL, then its still visible at the
> > root RQ level.
>
> My understanding it that you get !se at dequeue time when we are
> dequeuing a task from a throttled RQ. Isn't it?

I don't think so? !se means the RQ is not throttled.

> Thus, this means you are dequeuing a throttled task, I guess for
> example because of a migration.
> However, the point is that a task dequeue from a throttled RQ _is
> already_ not visible from the root RQ, because of the sub_nr_running()
> done by throttle_cfs_rq().

Yes that's what I was wondering, so my point was if its already not visible,
then why call schedutil. I felt call schedutil only if its visible like you
were doing for the other paths.

>
> > In that case should we still call the util_est_dequeue and the
> > cpufreq_update_util?
>
> I had a better look at the different code paths and I've possibly come
> up with some interesting observations. Lemme try to resume theme here.
>
> First of all, we need to distinguish from estimated utilization
> updates and schedutil updates, since they respond to two very
> different goals.

I agree with your assessments below and about not calling cpufreq when CPU is
about to idle.

thanks!

- Joel

Next message: Koo, Anthony: "RE: linux-next: Signed-off-by missing for commits in the drm tree"
Previous message: Jason Gunthorpe: "Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08"
In reply to: Vincent Guittot: "Re: [PATCH 3/3] sched/fair: schedutil: explicit update only when required"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]