Re: About CPU's Load Balance and CFS functions

From: Peter Zijlstra
Date: Mon Sep 07 2009 - 15:20:01 EST

On Mon, 2009-09-07 at 16:14 +0800, lookeylam wrote:
> Hello:
> I am not sure this is the right maillist to ask this
> question. I just have a try.
> I have a test on Dell 1950 with 8 cpus on board for testing
> the apache by ab command. And I find that in
> linux 2.6.18. The processes forked by apache are not well
> distributed on these 8 cpus.
> linux 2.6.23 is a little better than 2.6.18, but still some
> cpus are running busy and some cpus remains idle.
> While in 2.6.30, these 8 cpus are well used and the
> percentage of each cpu is nearly the same. And when I
> start the control group with cpuset type with
> sched_relax_domain_level( with value 3,4,5). The result of ab is 50ms
> better than test results without control group.
> I attribute this situation to to load_balance but not CFS,
> because CFS is just a scheduler for orgnizing the process inside one
> cpu, while load_balance is the main character to control the process
> and load between different cpus.
> But when i give out this conclusion, I confuse about the
> differences of these three kernels of load_balance.
> My questions are the above conclusion is right or not? How
> would these situation happen and why? I read the code of the kernel
> but I am still not sure.

load-balancing is generally considered part of the scheduler as a whole,
while CFS is indeed the cpu scheduler, it and the load-balancer are
related because they do have to work together.

Now, in the past 3+years the load-balancer has undergone significant
changes too -- and we're now again poking at it, .32 will likely have
quite radical changes to the whole load balancer.

The sched_relax_domain_level knob is one that controls one of the
coupling mechanisms, namely wake on idle, that is, we try and push newly
woken tasks away to idle cpus. The level you put in there is related to
the sched_domain level.

Normally we don't try and push newly woken tasks too far away, because
that'll increase the remote access penalty for related tasks, but some
workloads have lots of very short running unrelated tasks which do
benefit from this.

Anyway, I would suggest you keep an eye out for scheduler patches if
you're interested in this, all the scheduler development happens in
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at