Re: [discussion]sched: a rough proposal to enable power saving inscheduler

From: Juri Lelli
Date: Sun Aug 19 2012 - 06:12:15 EST


Hi all,
I can probably add some bits to the discussion, after all I'm preparing a talk for Plumbers that is strictly related :-). My points are not CFS related (so feel free to ignore me), but they would probably be interesting if we talk about power aware scheduling in Linux in general.

On 08/16/2012 04:31 PM, Morten Rasmussen wrote:
Hi all,

On Wed, Aug 15, 2012 at 12:05:38PM +0100, Peter Zijlstra wrote:

sub proposal:
1, If it's possible to balance task on idlest cpu not appointed 'balance
cpu'. If so, it may can reduce one more time balancing.
The idlest cpu can prefer the new idle cpu; and is the least load cpu;
2, se or task load is good for running time setting.
but it should the second basis in load balancing. The first basis of LB
is running tasks' number in group/cpu. Since whatever of the weight of
groups is, if the tasks number is less than cpu number, the group is
still has capacity to take more tasks. (will consider the SMT cpu power
or other big/little cpu capacity on ARM.)

Ah, no we shouldn't balance on nr_running, but on the amount of time
consumed. Imagine two tasks being woken at the same time, both tasks
will only run a fraction of the available time, you don't want this to
exceed your capacity because ran back to back the one cpu will still be
mostly idle.

What you want it to keep track of a per-cpu utilization level (inverse
of idle-time) and using PJTs per-task runnable avg see if placing the
new task on will exceed the utilization limit.

I think some of the Linaro people actually played around with this,
Vincent?


I agree. A better measure of cpu load and task weight than nr_running
and the current task load weight are necessary to do proper task
packing.

I have used PJTs per-task load-tracking for scheduling experiments on
heterogeneous systems and my experience is that it works quite well for
determining the load of a specific task. Something like PJTs work
would be a good starting point for power aware scheduling and better
support for heterogeneous systems.


I didn't tried PJTs work myself (it's on my todo list), but with SCHED_DEADLINE you can see the picture from the other side and, instead of tracking per-task load, you can enforce a task not to exceed its allowed "load".
This is done reserving some fraction of CPU time (runtime or budget) every predefined interval of time (period). Than this allocated bandwidth is enforced with proper scheduling mechanisms (BTW, I have another talk at Plumbers explaining the SCHED_DEADLINE patchset in more details).

One of the biggest challenges here for load-balancing is translating
task load from one cpu to another as the task load is influenced by the
total load of its cpu. So a task that appears to be heavy on an
oversubscribed cpu might not be so heavy after all when it is moved to a
cpu with plenty cpu time to spare. This issue is likely to be more
pronounced on heterogeneous systems and system with aggressive frequency
scaling. It might be possible to avoid having to translate load or that
it doesn't really matter, but I haven't completely convinced myself yet.


This is probably a key point where deadline scheduling could be helpful. A task load in this case cannot be influenced by other tasks in the system and it is one of the known variables. Actually, this is however half true. Isolation is achieved only considering CPU time between concurrently executing task, other terms like cache interferences etc. cannot be controlled. The nice fact is that a misbehaving task, one that tries or experiments deviations from its allowed CPU fraction, is throttled and cannot influence other tasks behavior.
As I will show during my talk (power aware deadline scheduling), other techniques are required when a task execution time it is not stricly known beforehand, beeing this due to interferences or intrinsic variability on the performed activity. They fall in the domain of adaptive/feedback scheduling.

My point is that getting the task load right or at least better is a
fundamental requirement for improving power aware scheduling.


Fully agree :-).

As I said, I just wanted to add something, sorry if I misinterpret the purpose of this discussion.

Best Regards,

- Juri Lelli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/