RE: + sched-use-tasklet-to-call-balancing.patch added to -mm tree

From: Christoph Lameter
Date: Mon Nov 13 2006 - 00:45:54 EST


On Sun, 12 Nov 2006, Chen, Kenneth W wrote:

> The key is that load balance scans from lowest SMT domain and traverse
> upwards to numa allnodes domain. The CPU check and break is placed at
> the end of the for loop so it is effectively shorting out i+1 iteration.

It shortens out if the current cpu is not the first cpu of the span. But
at that point it has already done load balancing for the cpu that is not
the first cpu of the span!

> (1) we should extend the logic to all rebalance tick, not just busy tick.

Ok.

> (2) we should initiate load balance within a domain only from least
> loaded group.

This would mean we would have to determine the least loaded group first.

> (3) the load scanning should be done only once per interval per domain.
> Currently, it is scanning load for each CPU within a domain. Even with
> the patch, the scanning is cut down to one per group. That is still
> too much. Large system that has hundreds of groups in numa allnodes
> will end up scanning / calculate load over and over again. That should
> be cut down as well.

Yes we want that. Maybe we could remember the load calculated in
sched_group and use that for a certain time period? Load would only be
calculated once and then all other processors make their decisions based
on the cached load?

> Part of all this problem probably stemmed from "load balance" is incapable
> of performing l-d between arbitrary pair of CPUs, and tightly tied load scan
> and actual l-d action. And on top of that l-d is really a pull operation
> to current running CPU. All these limitations dictate that every CPU somehow
> has to scan and pull. It is extremely inefficient on large system.

Right. However, if we follow this line of thought then we will be
redesigning the load balancing logic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/