Re: [RFCv3 PATCH 00/48] sched: Energy cost model for energy-aware scheduling

From: Vincent Guittot
Date: Thu Apr 02 2015 - 08:44:00 EST


On 4 February 2015 at 19:30, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> Several techniques for saving energy through various scheduler
> modifications have been proposed in the past, however most of the
> techniques have not been universally beneficial for all use-cases and
> platforms. For example, consolidating tasks on fewer cpus is an
> effective way to save energy on some platforms, while it might make
> things worse on others.
>
> This proposal, which is inspired by the Ksummit workshop discussions in
> 2013 [1], takes a different approach by using a (relatively) simple
> platform energy cost model to guide scheduling decisions. By providing
> the model with platform specific costing data the model can provide a
> estimate of the energy implications of scheduling decisions. So instead
> of blindly applying scheduling techniques that may or may not work for
> the current use-case, the scheduler can make informed energy-aware
> decisions. We believe this approach provides a methodology that can be
> adapted to any platform, including heterogeneous systems such as ARM
> big.LITTLE. The model considers cpus only, i.e. no peripherals, GPU or
> memory. Model data includes power consumption at each P-state and
> C-state.
>
> This is an RFC and there are some loose ends that have not been
> addressed here or in the code yet. The model and its infrastructure is
> in place in the scheduler and it is being used for load-balancing
> decisions. The energy model data is hardcoded, the load-balancing
> heuristics are still under development, and there are some limitations
> still to be addressed. However, the main idea is presented here, which
> is the use of an energy model for scheduling decisions.
>
> RFCv3 is a consolidation of the latest energy model related patches and
> previously posted patch sets related to capacity and utilization
> tracking [2][3] to show where we are heading. [2] and [3] have been
> rebased onto v3.19-rc7 with a few minor modifications. Large parts of
> the energy model code and use of the energy model in the scheduler has
> been rewritten and simplified. The patch set consists of three main
> parts (more details further down):
>
> Patch 1-11: sched: consolidation of CPU capacity and usage [2] (rebase)
>
> Patch 12-19: sched: frequency and cpu invariant per-entity load-tracking
> and other load-tracking bits [3] (rebase)
>
> Patch 20-48: sched: Energy cost model for energy-aware scheduling (RFCv3)


Hi Morten,

48 patches is a big number of patches and when i look into your
patchset, some feature are quite self contained. IMHO it would be
worth splitting it in smaller patchsets in order to ease the review
and the regression test.
>From a 1st look at your patchset , i have found
-patches 11,13,14 and 15 are only linked to frequency scaling invariance
-patches 12, 17 and 17 are only about adding cpu scaling invariance
-patches 18 and 19 are about tracking and adding the blocked
utilization in the CPU usage
-patches 20 to the end is linked the EAS

Regards,
Vincent

>
> Test results for ARM TC2 (2xA15+3xA7) with cpufreq enabled:
>
> sysbench: Single task running for 3 seconds.
> rt-app [4]: 5 medium (~50%) periodic tasks
> rt-app [4]: 2 light (~10%) periodic tasks
>
> Average numbers for 20 runs per test.
>
> Energy sysbench rt-app medium rt-app light
> Mainline 100* 100 100
> EA 279 88 63
>
> * Sensitive to task placement on big.LITTLE. Mainline may put it on
> either cpu due to it's lack of compute capacity awareness, while EA
> consistently puts heavy tasks on big cpus. The EA energy increase came
> with a 2.65x _increase_ in performance (throughput).
>
> [1] http://etherpad.osuosl.org/energy-aware-scheduling-ks-2013 (search
> for 'cost')
> [2] https://lkml.org/lkml/2015/1/15/136
> [3] https://lkml.org/lkml/2014/12/2/328
> [4] https://wiki.linaro.org/WorkingGroups/PowerManagement/Resources/Tools/WorkloadGen
>
> Changes:
>
> RFCv3:
>
> 'sched: Energy cost model for energy-aware scheduling' changes:
> RFCv2->RFCv3:
>
> (1) Remove frequency- and cpu-invariant load/utilization patches since
> this is now provided by [2] and [3].
>
> (2) Remove system-wide sched_energy to make the code easier to
> understand, i.e. single socket systems are not supported (yet).
>
> (3) Remove wake-up energy. Extra complexity that wasn't fully justified.
> Idle-state awareness introduced recently in mainline may be
> sufficient.
>
> (4) Remove procfs interface for energy data to make the patch-set
> smaller.
>
> (5) Rework energy-aware load balancing code.
>
> In RFCv2 we only attempted to pick the source cpu in an energy-aware
> fashion. In addition to support for finding the most energy
> inefficient source CPU during the load-balancing action, RFCv3 also
> introduces the energy-aware based moving of tasks between cpus as
> well as support for managing the 'tipping point' - the threshold
> where we switch away from energy model based load balancing to
> conventional load balancing.
>
> 'sched: frequency and cpu invariant per-entity load-tracking and other
> load-tracking bits' [3]
>
> (1) Remove blocked load from load tracking.
>
> (2) Remove cpu-invariant load tracking.
>
> Both (1) and (2) require changes to the existing load-balance code
> which haven't been done yet. These are therefore left out until that
> has been addressed.
>
> (3) One patch renamed.
>
> 'sched: consolidation of CPU capacity and usage' [2]
>
> (1) Fixed conflict when rebasing to v3.19-rc7.
>
> (2) One patch subject changed slightly.
>
>
> RFC v2:
> - Extended documentation:
> - Cover the energy model in greater detail.
> - Recipe for deriving platform energy model.
> - Replaced Kconfig with sched feature (jump label).
> - Add unweighted load tracking.
> - Use unweighted load as task/cpu utilization.
> - Support for multiple idle states per sched_group. cpuidle integration
> still missing.
> - Changed energy aware functionality in select_idle_sibling().
> - Experimental energy aware load-balance support.
>
>
> Dietmar Eggemann (17):
> sched: Make load tracking frequency scale-invariant
> sched: Make usage tracking cpu scale-invariant
> arm: vexpress: Add CPU clock-frequencies to TC2 device-tree
> arm: Cpu invariant scheduler load-tracking support
> sched: Get rid of scaling usage by cpu_capacity_orig
> sched: Introduce energy data structures
> sched: Allocate and initialize energy data structures
> arm: topology: Define TC2 energy and provide it to the scheduler
> sched: Infrastructure to query if load balancing is energy-aware
> sched: Introduce energy awareness into update_sg_lb_stats
> sched: Introduce energy awareness into update_sd_lb_stats
> sched: Introduce energy awareness into find_busiest_group
> sched: Introduce energy awareness into find_busiest_queue
> sched: Introduce energy awareness into detach_tasks
> sched: Tipping point from energy-aware to conventional load balancing
> sched: Skip cpu as lb src which has one task and capacity gte the dst
> cpu
> sched: Turn off fast idling of cpus on a partially loaded system
>
> Morten Rasmussen (23):
> sched: Track group sched_entity usage contributions
> sched: Make sched entity usage tracking frequency-invariant
> cpufreq: Architecture specific callback for frequency changes
> arm: Frequency invariant scheduler load-tracking support
> sched: Track blocked utilization contributions
> sched: Include blocked utilization in usage tracking
> sched: Documentation for scheduler energy cost model
> sched: Make energy awareness a sched feature
> sched: Introduce SD_SHARE_CAP_STATES sched_domain flag
> sched: Compute cpu capacity available at current frequency
> sched: Relocated get_cpu_usage()
> sched: Use capacity_curr to cap utilization in get_cpu_usage()
> sched: Highest energy aware balancing sched_domain level pointer
> sched: Calculate energy consumption of sched_group
> sched: Extend sched_group_energy to test load-balancing decisions
> sched: Estimate energy impact of scheduling decisions
> sched: Energy-aware wake-up task placement
> sched: Bias new task wakeups towards higher capacity cpus
> sched, cpuidle: Track cpuidle state index in the scheduler
> sched: Count number of shallower idle-states in struct
> sched_group_energy
> sched: Determine the current sched_group idle-state
> sched: Enable active migration for cpus of lower capacity
> sched: Disable energy-unfriendly nohz kicks
>
> Vincent Guittot (8):
> sched: add utilization_avg_contrib
> sched: remove frequency scaling from cpu_capacity
> sched: make scale_rt invariant with frequency
> sched: add per rq cpu_capacity_orig
> sched: get CPU's usage statistic
> sched: replace capacity_factor by usage
> sched: add SD_PREFER_SIBLING for SMT level
> sched: move cfs task on a CPU with higher capacity
>
> Documentation/scheduler/sched-energy.txt | 359 +++++++++++
> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts | 5 +
> arch/arm/kernel/topology.c | 218 +++++--
> drivers/cpufreq/cpufreq.c | 10 +-
> include/linux/sched.h | 43 +-
> kernel/sched/core.c | 119 +++-
> kernel/sched/debug.c | 12 +-
> kernel/sched/fair.c | 935 ++++++++++++++++++++++++-----
> kernel/sched/features.h | 6 +
> kernel/sched/idle.c | 2 +
> kernel/sched/sched.h | 75 ++-
> 11 files changed, 1559 insertions(+), 225 deletions(-)
> create mode 100644 Documentation/scheduler/sched-energy.txt
>
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/