Re: [PATCH] 2.6.3-rc3-mm1: sched-group-power

From: Nick Piggin
Date: Fri Feb 20 2004 - 20:41:38 EST




Rick Lindsley wrote:

So let me try a diagram. Each of these groups of numbers represent a
cpu_group, and the labels to the left are individual sched_domains.

SD1 01234567
SD2-SD3 0123 4567
SD4-SD7 01 23 45 67
SD8-SD15 0 1 2 3 4 5 6 7

Currently, we assume each cpu has a power of 1, so each cpu group in
domains SD8-SD15 would have a power of 1, each cpu group in SD4-SD7
would have a power of 2, each of SD2 and SD3 would have a power of 4,
and collectively, all CPUs as represented in SD1 would have a power of 8.
Of course, we don't really make use of this assumption but this just
enumerates our assumption that all nodes, all cpus are created equal.



Well we used to sum up the number of CPUs in each group, so it
wasn't quite that bad. We assumed all CPUs are created equal.

Your new power code would assign each cpu group a static power other
than this, making SMT pairs, for instance, 1.2 instead of 2. In the
case of four siblings, 1.4 instead of 4. Correct? In the example above,
SD2 and SD3 would have a power rating of 2.4, and SD1 would have a power
rating of 4*1.2 or 4.8, right?



Right.

With your current code, we only consult the power ratings if we've already
decided that we are currently "balanced enough".


Well we do work out the per group loads by dividing with the power
rating instead of cpus-in-the-group too.

I'd go one step further
and say that manipulating for power only makes sense if you have an idle
processor somewhere. If all processors are busy, then short of some
quality-of-process assessment, how can you improve power? (You could
improve fairness, I suppose, but that would require lots more stats and
history than we have here.) If one set of procs is slower than another,
won't that make itself apparent by a longer queue developing there? (or
shorter queues forming somewhere else?) and it being load-balanced
by the existing algorithm? Seems to me we only need to make power
decisions when we want to consider an idle processor stealing a task (a
possibly *running* task) from another processor because this processor
is faster/stronger/better.



Yeah, probably we could change that test to:
if (*imbalance <= SCHED_LOAD_SCALE / 2
&& this_load < SCHED_LOAD_SCALE)

Either way, if the calculation should be done in such a way that
if your CPUs are not idle, then it wouldn't predict a performance
increase.

No doubt there is room for improvement, but hopefully it is now
at a "good enough" stage...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/