Re: [PATCH] sched: prevent high load weight tasks suppressing balancing

From: Siddha, Suresh B
Date: Mon Mar 27 2006 - 16:49:32 EST


This breaks HT and MC optimizations.. Consider a DP system with each
physical processor having two HT logical threads.. if there are two
runnable processes running on package-0, with this patch scheduler
will never move one of those processes to package-1..

thanks,
suresh

On Mon, Mar 27, 2006 at 05:33:30PM +1100, Peter Williams wrote:
> Problem:
>
> On systems with more than 2 CPUs it is possible for a single task with a
> high smpnice load weight to suppress load balancing on other CPUs (to
> the one that it's running on) if it is the only runnable task on its
> CPU. E.g. consider a 4-way system (simple SMP system with no HT and
> cores) scenario where a high priority task (nice-20) is running on P0
> and two normal priority tasks running on P1. load balance with smp nice
> code will never be able to detect an imbalance and hence will never move
> one of the normal priority tasks on P1 to idle cpus P2 or P3 as P0 will
> always be identified as the busiest CPU but it has no tasks that can be
> moved.
>
> Solution:
>
> Make sure that only CPUs with tasks that can be moved get selected as
> the busiest queue. This involves ensuring that find_busiest_group()
> only considers groups that have at least one CPU with more than one task
> running as candidates for the busiest group and that
> find_busiest_queue() only considers CPUs that have more than one task
> running as candidates for the busiest run queue.
>
> One effect of this is that load balancing will be abandoned earlier in
> the sequence (i.e. before the double run queue locks are taken prior to
> calling move_tasks() rather than in move_tasks() itself) when there are
> no tasks that can be moved than would be the case without this patch.
>
> Signed-off-by: Peter Williams <pwil3058@xxxxxxxxxxxxxx>
>
> Peter
> PS This doesn't take into account tasks that can't be moved because they
> are pinned to a particular CPU. At this stage, I don't think that it's
> worth the effort to make the changes that would enable this.
> --
> Peter Williams pwil3058@xxxxxxxxxxxxxx
>
> "Learning, n. The kind of ignorance distinguishing the studious."
> -- Ambrose Bierce

> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c 2006-03-27 16:00:12.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c 2006-03-27 17:02:53.000000000 +1100
> @@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
> int local_group;
> int i;
> unsigned long sum_nr_running, sum_weighted_load;
> + unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
>
> local_group = cpu_isset(this_cpu, group->cpumask);
>
> @@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
>
> avg_load += load;
> sum_nr_running += rq->nr_running;
> + if (rq->nr_running > 1)
> + ++nr_loaded_cpus;
> sum_weighted_load += rq->raw_weighted_load;
> }
>
> @@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
> this = group;
> this_nr_running = sum_nr_running;
> this_load_per_task = sum_weighted_load;
> - } else if (avg_load > max_load) {
> + } else if (avg_load > max_load && nr_loaded_cpus) {
> max_load = avg_load;
> busiest = group;
> busiest_nr_running = sum_nr_running;
> @@ -2158,7 +2161,7 @@ find_busiest_group(struct sched_domain *
> group = group->next;
> } while (group != sd->groups);
>
> - if (!busiest || this_load >= max_load || busiest_nr_running <= 1)
> + if (!busiest || this_load >= max_load)
> goto out_balanced;
>
> avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
> @@ -2260,16 +2263,16 @@ out_balanced:
> static runqueue_t *find_busiest_queue(struct sched_group *group,
> enum idle_type idle)
> {
> - unsigned long load, max_load = 0;
> - runqueue_t *busiest = NULL;
> + unsigned long max_load = 0;
> + runqueue_t *busiest = NULL, *rqi;
> int i;
>
> for_each_cpu_mask(i, group->cpumask) {
> - load = weighted_cpuload(i);
> + rqi = cpu_rq(i);
>
> - if (load > max_load) {
> - max_load = load;
> - busiest = cpu_rq(i);
> + if (rqi->raw_weighted_load > max_load && rqi->nr_running > 1) {
> + max_load = rqi->raw_weighted_load;
> + busiest = rqi;
> }
> }
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/