Re: [External] Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable

From: Hao Jia
Date: Thu Mar 23 2023 - 03:04:13 EST




On 2023/3/23 Vineeth Pillai wrote:
Merging two threads.

On Tue, Mar 21, 2023 at 5:40 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:

CPU0 and CPU1 are two CPUs on the same core, task0 and task2 are the
highest priority tasks on rq0 and rq1 respectively, task2 is @max
on the entire core.

I'm assuming this all starts by rq0 doing a pick and getting task0.
Because any other selection would go down the whole !need_sync route.

I think this could also happen when rq1 starts the pick due to task2 wakeup
while task0 was running in rq0. In this case, core->core_cookie would be set
and we take the need_sync path I guess.

In the case that 'max' has a zero cookie, instead of continuing to
search for a runnable task on rq0 that matches @max's cookie, we
choose idle for rq0 directly.
At this time, it is obviously better to choose task1 to run for rq0,
which will increase the CPU utilization.
Therefore, we queue tasks with zero cookies in core_tree, and record
the number of non-zero cookie tasks of each rq to detect the status
of the sched-core.

I do remember this as a known issue (more of a known but unimplemented
optimization) which happens when you have a high priority non-cookie
task which is in front of several low priority ones on the same
thread/rq. Adding +Vineeth Pillai to see if he remembers the issue.

Yes, I remember this as one of the 2 issues we noticed, but could not get to
fixing it. Here we have non-cookied tasks considered special as a side effect
of implementation(non-cookied tasks not in core rbtree) and hence we force idle
if max is non-cookied and the highest prio task on the sibling is cookied.

The other issue was - we don't update core rbtree when vruntime changes and
this can cause starvation of cookied task if there are more than one task with
the same cookie on an rq.


If I understand correctly, when a cookied task is enqueued, the difference delta1 between its vruntime and min_vruntime is very large.

Another task with the same cookie is very actively dequeuing and enqueuing, and the difference delta2 between its vruntime and min_vruntime is always smaller than delta1?
I'm not sure if this is the case?

static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
{
- if (sched_core_enabled(rq))
- sched_core_dequeue(rq, p, flags);
+ sched_core_dequeue(rq, p, flags);

if (!(flags & DEQUEUE_NOCLOCK))
update_rq_clock(rq);

Yeah, this is an absolute no-no, it makes the overhead of the second rb
tree unconditional.

I agree. Could we keep it conditional by enqueuing 0-cookied tasks only when
coresched is enabled, just like what we do for cookied tasks? This is still an
overhead where we have two trees storing all the runnable tasks but in
different order. We would also need to populate core rbtree from cfs rbtree
on coresched enable and empty the tree on coresched disable.


I'm not sure if the other way is reasonable, I'm trying to provide a function for each scheduling class to find a highest priority non-cookie task.

For example fair_sched_class, we can use rq->cfs_tasks to traverse the search. But this search may take a long time, maybe we need to limit the number of searches.

Thanks,
Hao