Re: [RFC][PATCH 0/9] sched: Power scheduler design proposal

From: Arjan van de Ven
Date: Tue Jul 16 2013 - 15:57:50 EST


On 7/16/2013 12:21 PM, Peter Zijlstra wrote:

Suppose a 2 cpu system, one cpu is running 3/4 throttle, the other is
running at half speed. Both cpus are equally utilized. A new task
comes on.

Where do we run it?

We need to know that there's head-room on the 1/2 speed cpu and should
crank its pace and place the task there.

ok so you are interested in past "real" utilization of the hardware resources;
that is available generally (and tends to come from hardware counters, on ARM
as well).

you may not get it as a percentage, but in some absolute term, so you
can know which of the two is least loaded... that might be enough

Today cpufreq uses a library to get these counters, moving that library to the scheduler
or some similar place.... sounds like a great idea.
There is an argument for what to do on systems where such counters are either
absent or very expensive and that's good question; maybe one of the ARM folks
can say how expensive these counters are for them to see if there really is such
a problem?

Even without the new task; its not a 'balanced' situation, but it
appears that way because the cpu's are nearly equally utilized. Maybe if
we crank one cpu to the max it could run all tasks and have the other
cpu power gated. Or maybe they could both drop to 60% and run equal
loads.

which way is better for energy consumption is likely a per arch question,
and having the architecture provide some runtime configuration about how
valueable it is to spread out sounds sensible to me.

then the question of how much remaining capacity; this is a hard one, and not just
for Intel. Almost all mobile devices today are thermally constrained, ARM and Intel
alike (at least the higher performance ones)... the curse of wanting very thin and light
phones that are made of thermally isolating plastic (so that radio waves can go through)
and have a nice and bright screen...

With thermals as a whole you tend to not know you're hitting the wall until you try;
you may think you can go another gigahertz on a core, but when you go there you near instantly
hit a thermal limit that whacks you waaaay back down again.

(that reminds me, I'd love investigate for the scheduler to look at core temperature as one of the
factors in its decision... that might actually be one of the more interesting inputs to
scheduler decisions, both in terms of capacity planning and efficiency)


We need feedback for these problems; but you're telling us new Intel
stuff can't really tell us much of anything :/

s/new/existing/ to be honest; chips we've been selling in the last 4+ years.

What I'm saying is; sure the cpufreq driver might have chip specific
magic but it very much needs to tell us things too we can't have it do
its own thing and not care.

some of the things may come from other things than the P state selection part;
a lot of the things you're asking for will tend to come from counters I suspect.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/