load balancing regression since commit 367456c7

From: Tim Chen
Date: Tue Apr 10 2012 - 21:06:33 EST


Peter,

We noticed in a hackbench test (./hackbench 100 process 2000)
on a Sandy bridge 2 socket server, there has been a slow down
by a factor of 4 since commit 367456c7 was applied
(sched: Ditch per cgroup task lists for load-balancing).

The commit 5d6523e (sched: Fix load-balance wreckage) did
not fix the regression.

In the profile, there is heavy spin lock contention in the load_balance path of 3.4-rc2
where it was less than .003% of cpu before commit 367456c7.

When we looked into /proc/schedstat for 3.4-rc2 for the run duration,
on cpu0 schedule was called 13x more often, and schedule call which
left the processor idle was 530x as much.

There was also a big increase in try to wake up remote (sd->ttwu_wake_remote) count.

increase in sd->ttwu_wake_remote for cpu0
domain 0 540%
domain 1 7570%
domain 2 4426%

Wonder if there is unnecessary load balancing to remote cpu?

Tim


profile for 3.4-rc2

7.16% hackbench [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--56.52%-- load_balance
| idle_balance
| __schedule
| schedule
| |
| |--98.73%-- schedule_timeout
| | |
| | |--97.80%-- unix_stream_recvmsg
| | | sock_aio_read.part.7
| | | sock_aio_read
| | | do_sync_read
| | | vfs_read
| | | sys_read
| | | system_call
| | | __read_nocancel
| | | create_worker
| | | group
| | | main
| | | __libc_start_main
| | |






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/