[PATCH v3 12/12] sched/fair: Enable increased scale for kernel load

From: Yuyang Du
Date: Tue May 03 2016 - 23:45:47 EST


The increased scale or precision for kernel load has been disabled
since the commit e4c2fb0d5776 ("sched: Disable (revert) SCHED_LOAD_SCALE
increase"). But we do need it when we have task groups, especially on
bigger machines. Otherwise, we probably will run out of precision for
load distribution.

So, we reinstate it and resolve to fix whatsoever power regression may
be seen.

Suggested-by: Ingo Molnar <mingo@xxxxxxxxxx>
Suggested-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Signed-off-by: Yuyang Du <yuyang.du@xxxxxxxxx>
---
kernel/sched/sched.h | 51 +++++++++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 871da67..5f66a2c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -42,37 +42,36 @@ static inline void cpu_load_update_active(struct rq *this_rq) { }
#define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))

/*
- * Increase resolution of nice-level calculations for 64-bit architectures.
- * The extra resolution improves shares distribution and load balancing of
- * low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
- * hierarchies, especially on larger systems. This is not a user-visible change
- * and does not change the user-interface for setting shares/weights.
+ * Task weight (visible and set by user) and its load (invisible to user)
+ * can have independent ranges. We increase the scale of load for 64-bit
+ * architectures. The extra precision improves share distribution and
+ * load balancing of low-weight task groups (e.g., nice +19 on an autogroup),
+ * deeper taskgroup hierarchies, especially on larger systems. This is not
+ * a user-visible change and does not change the user-interface for setting
+ * shares/weights. We increase resolution only if we have enough bits to allow
+ * this increased precision (i.e., BITS_PER_LONG > 32). The costs for increasing
+ * resolution when BITS_PER_LONG <= 32 are pretty high and the returns do not
+ * justify the increased costs.
*
- * We increase resolution only if we have enough bits to allow this increased
- * resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
- * when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
- * increased costs.
- */
-#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
-# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
-# define user_to_kernel_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
-# define kernel_to_user_load(w) ((w) >> SCHED_FIXEDPOINT_SHIFT)
-#else
-# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
-# define user_to_kernel_load(w) (w)
-# define kernel_to_user_load(w) (w)
-#endif
-
-/*
- * Task weight (visible to user) and its load (invisible to user) have
- * independent resolution, but they should be well calibrated. We use
- * user_to_kernel_load() and kernel_to_user_load(w) to convert between
- * them. The following must be true:
+ * Therefore, the user load and kernel should be well expressed to make them
+ * easily exchanged. We use user_to_kernel_load() and kernel_to_user_load(w)
+ * to convert between them.
*
+ * Following equations are a simple illustration of their relationship:
* user_to_kernel_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
* kernel_to_user_load(NICE_0_LOAD) == sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
*/
-#define NICE_0_LOAD (1L << NICE_0_LOAD_SHIFT)
+#if defined(CONFIG_64BIT) && defined(CONFIG_FAIR_GROUP_SCHED)
+#define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
+#define user_to_kernel_load(w) (w << SCHED_FIXEDPOINT_SHIFT)
+#define kernel_to_user_load(w) (w >> SCHED_FIXEDPOINT_SHIFT)
+#else
+#define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
+#define user_to_kernel_load(w) (w)
+#define kernel_to_user_load(w) (w)
+#endif
+
+#define NICE_0_LOAD (1UL << NICE_0_LOAD_SHIFT)

/*
* Single value that decides SCHED_DEADLINE internal math precision.
--
1.7.9.5