Re: [PATCH] Fix to excess pre-schedule migrating during Real Time overload on multiple CPUs.

From: Steven Rostedt
Date: Fri Jul 29 2011 - 08:01:28 EST


On Fri, 2011-07-29 at 07:30 +0200, uberj@xxxxxxxxxxxxx wrote:
> From cf87d70969357f46e5c28804ac39c2139652af8f Mon Sep 17 00:00:00 2001
> From: Jacques Uber <uberj@xxxxxxxxxxxxx>
> Date: Thu, 28 Jul 2011 17:49:38 -0700
> Subject: [PATCH] Fix to excess pre-schedule migrating during Real Time
> overload on multiple CPUs.
>
> If multiple processors are under real time overload there are situations where
> a processor will pull multiple tasks from multiple CPU's while trying to
> schedule the highest priority task within it's root domain. As an example, if
> there are four CPU's all under overload and the highest priority task on CPU3
> yields its time slice, CPU3 will try to pull the next highest priority
> task onto
> its the runqueue. Without the patch, if tasks were ordered in ascending order
> on CPU[0-2], CPU3 would pull all the tasks, ruining any possibility of using
> cached data.

As I stated on IRC, this is a theoretical situation, as real RT systems
would bind important RT tasks to CPUs, and those non-RT systems that
happen to run a bunch of RT tasks, usually have those RT tasks set up at
the same priority. If this situation were to happen, it would seldom
happen. It is likely that a bunch of overloaded RT tasks, would bounce
anyway regardless how you do this.

>
> The simple solution to this scenario is to maintain a "best choice"
> for CPU3 to
> pull to it's run queue. Only after checking all the queued tasks and choosing
> which one should be ran, will it deactivate, set_task, and reactivate the task
> onto it's runqueue.

There's a major bug in this solution. You have to release the rq locks,
which means that once you found your "best" task, it may no longer be
available to pull.

Also, your patch is broken because it tests for the pull inside the if
statement, and it looks like you just pulled all tasks.


>
> This patch was inspired by a Linux Journal article about the Real Time
> Scheduler.
> http://www.linuxjournal.com/magazine/real-time-linux-kernel-scheduler?page=0,3
>
> Signed-off-by: Jacques Uber <uberj@xxxxxxxxxxxxx>
> Signed-off-by: Kevin Strasser <strassek@xxxxxxxxxxxxx>
> ---
> kernel/sched_rt.c | 11 ++++++++---
> 1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 97540f0..067f159 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -1482,6 +1482,7 @@ static int pull_rt_task(struct rq *this_rq)
> {
> int this_cpu = this_rq->cpu, ret = 0, cpu;
> struct task_struct *p;
> + struct task_struct *best = NULL;

broken whitespace throughout.

> struct rq *src_rq;
>
> if (likely(!rt_overloaded(this_rq)))
> @@ -1540,9 +1541,10 @@ static int pull_rt_task(struct rq *this_rq)
>
> ret = 1;
>
> - deactivate_task(src_rq, p, 0);
> - set_task_cpu(p, this_cpu);
> - activate_task(this_rq, p, 0);
> + if(!best)
> + best = p;
> + else if(p->prio > best->prio)
> + best = p;
> /*
> * We continue with the search, just in
> * case there's an even higher prio task
> @@ -1550,6 +1552,9 @@ static int pull_rt_task(struct rq *this_rq)
> * but possible)
> */
> }
> + deactivate_task(src_rq, p, 0);
> + set_task_cpu(p, this_cpu);
> + activate_task(this_rq, p, 0);

This is still inside the loop. You just pulled p without checking
anything. 'best' is not used at all. You just made things worse.

Before touching the scheduler, always have a test case that shows the
problem, and can show that a patch solves that problem. The scheduler is
too critical to the entire system to just apply random patches.

-- Steve

> skip:
> double_unlock_balance(this_rq, src_rq);
> }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/