Re: busted CFS group load balancer?

From: Chris Friesen
Date: Mon Nov 17 2008 - 16:19:38 EST


Ken Chen wrote:
On Mon, Nov 17, 2008 at 7:37 AM, Chris Friesen wrote:
It appears that the fair-group load balancer in 2.6.27 does not work
properly.
There was an issue fixed post 2.6.27 where the load balancer didn't work
properly if there was one task per group per cpu. You might try
backporting commit 38736f4 and see if that helps.

Tested git commit 38736f4, it doesn't fix the problem I'm seeing.


I plugged in the same weights into my test app (groups 1 and 2 instead of ant/bee) and got the results below for a 10-sec run. The "actual" numbers give the overall average and then the values for each hog separately. In this case we see that both tasks in group 2 ended up sharing a cpu with one of the tasks from group 1.

group actual(%) expected(%) ctx switches max_latency(ms)
1 99.69(99.38/99.99) 99.81 160/262 4/0
2 0.31( 0.31/0.31) 0.19 32/33 391/375

I've only got a 2-way system. If the results really are that much worse on larger systems, then that's going to cause problems for us as well. I'll see if I can get some time on a bigger machine.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/