Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable

From: Vineeth Pillai
Date: Wed Mar 22 2023 - 16:43:42 EST


Merging two threads.

On Tue, Mar 21, 2023 at 5:40 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > CPU0 and CPU1 are two CPUs on the same core, task0 and task2 are the
> > highest priority tasks on rq0 and rq1 respectively, task2 is @max
> > on the entire core.

> I'm assuming this all starts by rq0 doing a pick and getting task0.
> Because any other selection would go down the whole !need_sync route.
>
I think this could also happen when rq1 starts the pick due to task2 wakeup
while task0 was running in rq0. In this case, core->core_cookie would be set
and we take the need_sync path I guess.

> > In the case that 'max' has a zero cookie, instead of continuing to
> > search for a runnable task on rq0 that matches @max's cookie, we
> > choose idle for rq0 directly.
> > At this time, it is obviously better to choose task1 to run for rq0,
> > which will increase the CPU utilization.
> > Therefore, we queue tasks with zero cookies in core_tree, and record
> > the number of non-zero cookie tasks of each rq to detect the status
> > of the sched-core.
>
> I do remember this as a known issue (more of a known but unimplemented
> optimization) which happens when you have a high priority non-cookie
> task which is in front of several low priority ones on the same
> thread/rq. Adding +Vineeth Pillai to see if he remembers the issue.
>
Yes, I remember this as one of the 2 issues we noticed, but could not get to
fixing it. Here we have non-cookied tasks considered special as a side effect
of implementation(non-cookied tasks not in core rbtree) and hence we force idle
if max is non-cookied and the highest prio task on the sibling is cookied.

The other issue was - we don't update core rbtree when vruntime changes and
this can cause starvation of cookied task if there are more than one task with
the same cookie on an rq.

> > static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
> > {
> > - if (sched_core_enabled(rq))
> > - sched_core_dequeue(rq, p, flags);
> > + sched_core_dequeue(rq, p, flags);
> >
> > if (!(flags & DEQUEUE_NOCLOCK))
> > update_rq_clock(rq);

> Yeah, this is an absolute no-no, it makes the overhead of the second rb
> tree unconditional.

I agree. Could we keep it conditional by enqueuing 0-cookied tasks only when
coresched is enabled, just like what we do for cookied tasks? This is still an
overhead where we have two trees storing all the runnable tasks but in
different order. We would also need to populate core rbtree from cfs rbtree
on coresched enable and empty the tree on coresched disable.

Thanks,
Vineeth