RE: + sched-use-tasklet-to-call-balancing.patch added to -mm tree

From: Christoph Lameter
Date: Fri Nov 10 2006 - 13:50:52 EST


On Thu, 9 Nov 2006, Chen, Kenneth W wrote:

> All results are within noise range. The global tasklet does a fairly
> good job especially on context switch intensive workload like aim7,
> volanomark, tbench etc. Note all machines are non-numa platform.

> Base on the data, I think we should make the load balance tasklet one per numa
> node instead of one per CPU.

I have done some testing on NUMA with 8p / 4n and 256 p / 128n which seems
to indicate that this is doing very well. Having a global tasklet avoids
concurrent load balancing from multiple nodes which smoothes out the load
balancing load over all nodes in a NUMA system. It fixes the issues that
we saw with contention during concurrent load balancing.

On a 8p system:

I) Percent of ticks where load balancing was found to be required

II) Percent of ticks where we attempted load balancing
but we found that we need to try again due to load balancing
in progress elsewhere (This increases (I) since we found that
load balancing was required but we decided to defer. Tasklet
was not invoked).

I) II)
Boot: 70% ~1%
AIM7: 30% 2%
Idle: 50% <0.5%

256p:
I) II)
Boot: 80% 30%
AIM7: 90% 30%
Idle: 95% 30%

So on larger systems we have more attempts to concurrently execute the
tasklet which also inflates I). I would expect that rate to be raised
further on even larger systems until we likely reach a point of continuous
load balancing somehwere in the system at 1024p (will take some time to to
such a system). It is likely prefereable at that scale to still not have
concurrency. Concurrency may mean that we concurrently balance huge sched
domains which causes long delays on multiple processor and may cause
contention between those load balancing threads that attempt to move tasks
from the same busy processor.

Suresh noted at the Intel OS forum yesterday that we may be able to avoid
running load balancing of the larger sched domains from all processors.
That would reduce the overhead of scheduling and reduce the percentage of
time that load blancing is active even on very large systems.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/