[wake_afine fixes/improvements 3/3] sched: introduce sched_feat(NO_HOT_AFFINE)

From: Paul Turner
Date: Fri Jan 14 2011 - 21:03:22 EST


re-introduce the cache-cold requirement for affine wake-up balancing.

A much more aggressive migration cost (currently 0.5ms) appears to have tilted
the needle towards favouring not performing affine migrations for cache_hot
tasks.

Since the update_rq path is more expensive now (and the 'hot' window so small),
avoid hammering it in the common case where the (possibly slightly stale)
rq->clock_task value has already advanced enough to invalidate hot-ness.

Signed-off-by: Paul Turner <pjt@xxxxxxxxxx>

---
kernel/sched_fair.c | 20 +++++++++++++++++++-
kernel/sched_features.h | 5 +++++
2 files changed, 24 insertions(+), 1 deletion(-)

Index: tip3/kernel/sched_fair.c
===================================================================
--- tip3.orig/kernel/sched_fair.c
+++ tip3/kernel/sched_fair.c
@@ -1376,6 +1376,23 @@ task_hot(struct task_struct *p, u64 now)
return delta < (s64)sysctl_sched_migration_cost;
}

+/*
+ * Since sched_migration_cost is (relatively) very small we only need to
+ * actually update the clock in the boundary case when determining whether a
+ * task is hot or not.
+ */
+static int task_hot_lazy(struct task_struct *p)
+{
+ struct rq *rq = task_rq(p);
+
+ if (!task_hot(p, rq->clock_task))
+ return 0;
+
+ update_rq_clock(rq);
+
+ return task_hot(p, rq->clock_task);
+}
+
#ifdef CONFIG_FAIR_GROUP_SCHED
/*
* effective_load() calculates the load change as seen from the root_task_group
@@ -1664,7 +1681,8 @@ select_task_rq_fair(struct rq *rq, struc
int sync = wake_flags & WF_SYNC;

if (sd_flag & SD_BALANCE_WAKE) {
- if (cpumask_test_cpu(cpu, &p->cpus_allowed))
+ if (cpumask_test_cpu(cpu, &p->cpus_allowed) &&
+ (!sched_feat(NO_HOT_AFFINE) || !task_hot_lazy(p)))
want_affine = 1;
new_cpu = prev_cpu;
}
Index: tip3/kernel/sched_features.h
===================================================================
--- tip3.orig/kernel/sched_features.h
+++ tip3/kernel/sched_features.h
@@ -64,3 +64,8 @@ SCHED_FEAT(OWNER_SPIN, 1)
* Decrement CPU power based on irq activity
*/
SCHED_FEAT(NONIRQ_POWER, 1)
+
+/*
+ * Don't consider cache-hot tasks for affine wakeups
+ */
+SCHED_FEAT(NO_HOT_AFFINE, 1)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/