[RFC][PATCH] sched: Improve tick preemption

From: Peter Zijlstra
Date: Mon Sep 13 2010 - 14:04:26 EST


On an SMP machine, but sysctl knobs adjusted as if it were UP and
everything ran with schedtool -a1

For workload I used: make O=defconfig-build -j10 kernel/

(full bzImage builds take like forever with a single cpu, runs were done
cache-hot)

Normal:

# for i in {latency,min_granularity,wakeup_granularity}; do cat sched_${i}_ns; done
6000000
2000000
1000000


#schedtool -a1 -e ./wakeup-latency
maximum latency: 22169.0 Âs
average latency: 1559.8 Âs
missed timer events: 0


# for i in {latency,min_granularity,wakeup_granularity}; do cat sched_${i}_ns; done
6000000
750000
1000000

# schedtool -a1 -e ./wakeup-latency
maximum latency: 11999.9 Âs
average latency: 710.9 Âs
missed timer events: 0


Patched:

# for i in {latency,min_granularity,wakeup_granularity}; do cat sched_${i}_ns; done
6000000
2000000
1000000

maximum latency: 18042.3 Âs
average latency: 2729.3 Âs
missed timer events: 0


# for i in {latency,min_granularity,wakeup_granularity}; do cat sched_${i}_ns; done
6000000
750000
1000000

maximum latency: 9985.8 Âs
average latency: 551.4 Âs
missed timer events: 0


Could others try and reproduce this while I try and run a few other
benchmarks?

---
Subject: sched: Improve tick preemption

Regular tick preemption has a few issues:

- it compares delta_exec (wall-time) with an unweighted measure
(min_gran)

- that min_gran might be too small for systems with a small number
of tasks.

Cure the first issue by instead comparing the vruntime (virtual time)
difference with this unweighted measure.

Cure the second issue by computing the actual granularity for small
systems.

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
---
kernel/sched_fair.c | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9b5b4f8..0011622 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -457,6 +457,16 @@ static u64 __sched_period(unsigned long nr_running)
return period;
}

+static u64 __sched_gran(unsigned long nr_running)
+{
+ unsigned long latency = sysctl_sched_latency;
+
+ if (nr_running >= sched_nr_latency)
+ return sysctl_sched_min_granularity;
+
+ return latency / nr_running;
+}
+
/*
* We calculate the wall-time slice from the period by taking a part
* proportional to the weight.
@@ -865,14 +875,13 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
if (!sched_feat(WAKEUP_PREEMPT))
return;

- if (delta_exec < sysctl_sched_min_granularity)
- return;
-
if (cfs_rq->nr_running > 1) {
struct sched_entity *se = __pick_next_entity(cfs_rq);
s64 delta = curr->vruntime - se->vruntime;
+ if (delta < sysctl_sched_min_granularity)
+ return;

- if (delta > ideal_runtime)
+ if (delta > __sched_gran(cfs_rq->nr_running))
resched_task(rq_of(cfs_rq)->curr);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/