Max Krasnyansky wrote:Ah, makes sense.Dimitri Sivanich wrote:kernel: CPU3 root domain e0000069ecb20000It won't be much of a balancing in this case because this just one cpu per
kernel: CPU3 attaching sched-domain:
kernel: domain 0: span 3 level NODE
kernel: groups: 3
kernel: CPU2 root domain e000006884a00000
kernel: CPU2 attaching sched-domain:
kernel: domain 0: span 2 level NODE
kernel: groups: 2
kernel: CPU1 root domain e000006884a20000
kernel: CPU1 attaching sched-domain:
kernel: domain 0: span 1 level NODE
kernel: groups: 1
kernel: CPU0 root domain e000006884a40000
kernel: CPU0 attaching sched-domain:
kernel: domain 0: span 0 level NODE
kernel: groups: 0
Which is the way sched_load_balance is supposed to work. You need to set
sched_load_balance=0 for all cpusets containing any cpu you want to disable
balancing on, otherwise some balancing will happen.
domain.
In other words no that's not how it supposed to work. There is code in
cpu_attach_domain() that is supposed to remove redundant levels
(sd_degenerate() stuff). There is an explicit check in there for numcpus == 1.
btw The reason you got a different result that I did is because you have a
NUMA box where is mine is UMA. I was able to reproduce the problem though by
enabling multi-core scheduler. In which case I also get one redundant domain
level CPU, with a single CPU in it.
So we definitely need to fix this. I'll try to poke around tomorrow and figure
out why redundant level is not dropped.
You were not using latest kernel, were you?
There was a bug in sd degenerate code, and it has already been fixed:
http://lkml.org/lkml/2008/11/8/10
Yes.So when we do that for just par3, we get the following:Ok. This one is a bug in cpuset.c:generate_sched_domains(). Sched domain
echo 0 > par3/cpuset.sched_load_balance
kernel: cpusets: rebuild ndoms 3
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 1 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 2 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: CPU3 root domain default
kernel: CPU3 attaching NULL sched-domain.
So the def_root_domain is now attached for CPU 3. And we do have a NULL
sched-domain, which we expect for a cpu with load balancing turned off. If
we turn sched_load_balance off ('0') on each of the other cpusets (par0-2),
each of those cpus would also have a NULL sched-domain attached.
generator in cpusets should not drop domains with single cpu in them when
sched_load_balance==0. I'll look at that tomorrow too.
Do you mean the correct behavior should be as following?
kernel: cpusets: rebuild ndoms 4
But why do you think this is a bug? In generate_sched_domains(), cpusets withThe problem is that all cpus in cpusets with sched_load_balance==0 end up in the default root_domain which causes lock contention.
sched_load_balance==0 will be skippped:
list_add(&top_cpuset.stack_list, &q);
while (!list_empty(&q)) {
...
if (is_sched_load_balance(cp)) {
csa[csn++] = cp;
continue;
}
...
}
Correct me if I misunderstood your point.