[tip:sched/core] sched/fair: Introduce an energy estimation helper function

From: tip-bot for Quentin Perret
Date: Tue Dec 11 2018 - 10:38:59 EST


Commit-ID: 390031e4c309c94ecc07a558187eb5185200df83
Gitweb: https://git.kernel.org/tip/390031e4c309c94ecc07a558187eb5185200df83
Author: Quentin Perret <quentin.perret@xxxxxxx>
AuthorDate: Mon, 3 Dec 2018 09:56:26 +0000
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Tue, 11 Dec 2018 15:17:02 +0100

sched/fair: Introduce an energy estimation helper function

In preparation for the definition of an energy-aware wakeup path,
introduce a helper function to estimate the consequence on system energy
when a specific task wakes-up on a specific CPU. compute_energy()
estimates the capacity state to be reached by all performance domains
and estimates the consumption of each online CPU according to its Energy
Model and its percentage of busy time.

Signed-off-by: Quentin Perret <quentin.perret@xxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: adharmap@xxxxxxxxxxxxxx
Cc: chris.redpath@xxxxxxx
Cc: currojerez@xxxxxxxxxx
Cc: dietmar.eggemann@xxxxxxx
Cc: edubezval@xxxxxxxxx
Cc: gregkh@xxxxxxxxxxxxxxxxxxx
Cc: javi.merino@xxxxxxxxxx
Cc: joel@xxxxxxxxxxxxxxxxx
Cc: juri.lelli@xxxxxxxxxx
Cc: morten.rasmussen@xxxxxxx
Cc: patrick.bellasi@xxxxxxx
Cc: pkondeti@xxxxxxxxxxxxxx
Cc: rjw@xxxxxxxxxxxxx
Cc: skannan@xxxxxxxxxxxxxx
Cc: smuckle@xxxxxxxxxx
Cc: srinivas.pandruvada@xxxxxxxxxxxxxxx
Cc: thara.gopinath@xxxxxxxxxx
Cc: tkjos@xxxxxxxxxx
Cc: valentin.schneider@xxxxxxx
Cc: vincent.guittot@xxxxxxxxxx
Cc: viresh.kumar@xxxxxxxxxx
Link: https://lkml.kernel.org/r/20181203095628.11858-14-quentin.perret@xxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 767e7675774b..b3c94584d947 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6377,6 +6377,82 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
return !task_fits_capacity(p, min_cap);
}

+/*
+ * Predicts what cpu_util(@cpu) would return if @p was migrated (and enqueued)
+ * to @dst_cpu.
+ */
+static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
+{
+ struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+ unsigned long util_est, util = READ_ONCE(cfs_rq->avg.util_avg);
+
+ /*
+ * If @p migrates from @cpu to another, remove its contribution. Or,
+ * if @p migrates from another CPU to @cpu, add its contribution. In
+ * the other cases, @cpu is not impacted by the migration, so the
+ * util_avg should already be correct.
+ */
+ if (task_cpu(p) == cpu && dst_cpu != cpu)
+ sub_positive(&util, task_util(p));
+ else if (task_cpu(p) != cpu && dst_cpu == cpu)
+ util += task_util(p);
+
+ if (sched_feat(UTIL_EST)) {
+ util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
+
+ /*
+ * During wake-up, the task isn't enqueued yet and doesn't
+ * appear in the cfs_rq->avg.util_est.enqueued of any rq,
+ * so just add it (if needed) to "simulate" what will be
+ * cpu_util() after the task has been enqueued.
+ */
+ if (dst_cpu == cpu)
+ util_est += _task_util_est(p);
+
+ util = max(util, util_est);
+ }
+
+ return min(util, capacity_orig_of(cpu));
+}
+
+/*
+ * compute_energy(): Estimates the energy that would be consumed if @p was
+ * migrated to @dst_cpu. compute_energy() predicts what will be the utilization
+ * landscape of the * CPUs after the task migration, and uses the Energy Model
+ * to compute what would be the energy if we decided to actually migrate that
+ * task.
+ */
+static long
+compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
+{
+ long util, max_util, sum_util, energy = 0;
+ int cpu;
+
+ for (; pd; pd = pd->next) {
+ max_util = sum_util = 0;
+ /*
+ * The capacity state of CPUs of the current rd can be driven by
+ * CPUs of another rd if they belong to the same performance
+ * domain. So, account for the utilization of these CPUs too
+ * by masking pd with cpu_online_mask instead of the rd span.
+ *
+ * If an entire performance domain is outside of the current rd,
+ * it will not appear in its pd list and will not be accounted
+ * by compute_energy().
+ */
+ for_each_cpu_and(cpu, perf_domain_span(pd), cpu_online_mask) {
+ util = cpu_util_next(cpu, p, dst_cpu);
+ util = schedutil_energy_util(cpu, util);
+ max_util = max(util, max_util);
+ sum_util += util;
+ }
+
+ energy += em_pd_energy(pd->em_pd, max_util, sum_util);
+ }
+
+ return energy;
+}
+
/*
* select_task_rq_fair: Select target runqueue for the waking task in domains
* that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,