Re: power-efficient scheduling design

From: Catalin Marinas
Date: Mon Jun 17 2013 - 07:24:29 EST


On Fri, Jun 14, 2013 at 05:05:22PM +0100, Morten Rasmussen wrote:
> The intention is that the power scheduler will implement the (unified)
> power policy. It gets the current load of the system from the scheduler.
> Based on this information it will adjust the compute capacity available
> to the scheduler and drive frequency changes such that enough compute
> capacity is available to handle the current load. If the total load can
> be handled by a subset of cpus, it will reduce the capacity of the
> excess cpus to 0 (cpu_power=1). Likewise, if the load increases it will
> increase capacity of one or more idle cpus to allow the scheduler to
> spread the load. The power scheduler has knowledge about the power
> topology and will guide the scheduler to idle the most optimum cpus by
> reducing its capacity. Global idle decision will be handled by the power
> scheduler, so cpuidle can over time be reduced to become just a driver,
> once we have added C-state selection to the power scheduler.
>
> The scheduler is left to focus on scheduling mechanics and finding the
> best possible load balance on the cpu capacities set by the power
> scheduler. It will share a detailed view of the current load with the
> power scheduler to enable it to make the right capacity adjustments. The
> scheduler will need some optimization to cope better with asymmetric
> compute capacities. We may want to reduce capacity of some cpu to
> increase their idle time while letting others take the majority of the
> load.
...
> I'm aware that the scheduler and power scheduler decisions may be
> inextricably linked so we may decide to merge them. However, I think it
> is worth trying to keep the power scheduling decisions out of the
> scheduler until we have proven it infeasible.

Thanks for posting this, I agree with the proposal. I would like to
emphasise that this is a rather "divide and conquer" approach to
reaching a unified solution. Some of the steps involved (not necessarily
in this order):

1. Introduction of a power scheduler (replacing cpufreq governor) aware
of the overall load and CPU capacities. It requests CPU frequency
changes from the low-level cpufreq driver and gives hints to the task
scheduler about load asymmetry (via cpu_power).
2. More accurate task load tracking (an attempt here -
https://lkml.org/lkml/2013/4/16/289 - but possibly better accuracy
using CPU cycles or other arch-specific counters).
3. Load balancer improvements for asymmetric CPU performance levels
(e.g. frequency scaling).
4. Power scheduler driving the CPU idle decisions (replacing the cpuidle
governor).
5. Power scheduler increased awareness of the run-queues content
(number of tasks, individual task loads) and load balancer behaviour,
feeding extra hints back to the load balancer (e.g. only move tasks
below/above certain load, trigger a load balance).
6. Performance vs power saving tuning (policies).
7. More specific optimisations based on the CPU topology (big.little,
turbo boost, etc.)
?. Lots of other things based on testing and community reviews.

Step 5 above will further increase the coupling between load balancer
and power scheduler and we could end up with a unified implementation.
But before then it is simpler to reason in terms of (a) better load
balancing in an asymmetric configuration and (b) CPU capacity needed for
the overall load.

--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/