Re: [RFC PATCH v2 09/17] sched: Introduce sched_class::pick_task()

From: Aaron Lu
Date: Mon Apr 29 2019 - 01:38:26 EST


On Tue, Apr 23, 2019 at 04:18:14PM +0000, Vineeth Remanan Pillai wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c055bad249a9..45d86b862750 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4132,7 +4132,7 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
> * Avoid running the skip buddy, if running something else can
> * be done without getting too unfair.
> */
> - if (cfs_rq->skip == se) {
> + if (cfs_rq->skip && cfs_rq->skip == se) {
> struct sched_entity *second;
>
> if (se == curr) {
> @@ -4150,13 +4150,13 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
> /*
> * Prefer last buddy, try to return the CPU to a preempted task.
> */
> - if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
> + if (left && cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
> se = cfs_rq->last;
>
> /*
> * Someone really wants this to run. If it's not unfair, run it.
> */
> - if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
> + if (left && cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
> se = cfs_rq->next;
>
> clear_buddies(cfs_rq, se);
> @@ -6937,6 +6937,37 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> set_last_buddy(se);
> }
>
> +static struct task_struct *
> +pick_task_fair(struct rq *rq)
> +{
> + struct cfs_rq *cfs_rq = &rq->cfs;
> + struct sched_entity *se;
> +
> + if (!cfs_rq->nr_running)
> + return NULL;
> +
> + do {
> + struct sched_entity *curr = cfs_rq->curr;
> +
> + se = pick_next_entity(cfs_rq, NULL);
> +
> + if (!(se || curr))
> + return NULL;

I think you have already avoided the null pointer access bug in
the above pick_next_entity() by doing multiple checks for null pointers:
cfs_rq->skip and left.

An alternative way to fix the null pointer access bug: if curr is the
only runnable entity in this cfs_rq, there is no need to call
pick_next_entity(cfs_rq, NULL) since the rbtree is empty. This way
pick_next_entity() doesn't need change. something like:

do {
struct sched_entity *curr = cfs_rq->curr;

if (curr && curr->on_rq && cfs_rq->nr_running == 1)
se = NULL;
else
se = pick_next_entity(cfs_rq, NULL);

/* the following code doesn't change */
> +
> + if (curr) {
> + if (se && curr->on_rq)
> + update_curr(cfs_rq);
> +
> + if (!se || entity_before(curr, se))
> + se = curr;
> + }
> +
> + cfs_rq = group_cfs_rq(se);
> + } while (cfs_rq);
> +
> + return task_of(se);
> +}

There is another problem I'm thinking: suppose cpu0 and cpu1 are
siblings and task A, B are runnable on cpu0 and curr is A. When cpu1
schedules, pick_task_fair() will also be called for cpu0 to decide
which CPU's task to preempt the other.

When pick_task_fair() is called for cpu0 due to cpu1 schedules:
curr(i.e. A) may only run a few nanoseconds, and thus can have a higher
vruntime than B. So we chose B to fight with task chosen from cpu1. If
B wins, we will schedule B on cpu0. If B loses, we will probably
schedule idle on cpu0(if cookie unmatch). Either case, A didn't get its
share. We probably want to make sure a task at least running for some
time before being considered to be preempted.