[RFC PATCH] Turn off the tick even when not idle

From: Josh Triplett
Date: Tue Sep 01 2009 - 12:04:21 EST

The following patch (not for application any time soon) hacks away the
timer interrupt even when not idle, by triggering the nohz mechanism
even if not running the idle task.

When a process does some number crunching for a while, without involving
the kernel, the kernel still interrupts it HZ times per second to figure
out if it has any work to do. With a system dedicated to doing such
number crunching, the answer will almost always come up "no"; however,
the kernel takes a while figuring out all the "no"s from various
subsystems, every timer tick. On my system, the timer tick takes about
80us, every 1/HZ seconds; that represents a significant overhead. 80us
out of every 1ms, for instance, means 8% overhead. Furthermore, the
time taken varies, and the timer interrupts lead to jitter in the
performance of the number crunching.

This patch represents an attempt to demonstrate the effect of removing
the timer interrupt. It by no means represents a complete solution; it
just thwacks the timer interrupt over the head, ignoring the various
things it does. Known issues include breaking RCU, process accounting
(using "300%" of one CPU), and POSIX CPU timers, among other things. I
have some fixes in progress for some of those.

Nevertheless, this patch successfully boots, runs, and demonstrates some
good results. I ran the benchmark "Fixed Time Quantum" (ftq), which
repeatedly runs fixed length intervals and counts how many iterations of
a simple loop it can run within those intervals. I've attached a plot
of the results with HZ=1000, HZ=250, and this nohz hack; also available
at http://master.kernel.org/~josh/nohz-hack/ along with the raw numbers.
I sorted the samples by iterations completed, to group similar values
together. (The ~5 bad samples on the far left represent unavoidable
SMIs on the laptop I ran the tests on.)

Notice how with the timer tick turned off, the results show long
"shelves" of near-identical values. More than half the samples fall
into one such shelf, consistently completing almost the same hundreds of
thousands of iterations within ~20 iterations of each other. With the
timer tick turned on, the results spread out a lot more, in the
direction of worse performance.

Please, give this patch a try and let me know what you think.

I'd like to work towards a patch which really can kill off the timer
tick, making the kernel entirely event-driven and removing the polling
that occurs in the timer tick. I've reviewed everything the timer tick
does, and every last bit of it could occur using an event-driven

- Josh Triplett

-- >8 --

kernel/softirq.c | 2 +-
kernel/time/tick-sched.c | 8 --------
2 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index eb5e131..8bf11b4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -305,7 +305,7 @@ void irq_exit(void)
/* Make sure that timer wheel updates are propagated */
- if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
+ if (!in_interrupt() && !need_resched())
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e0f59a2..707ba98 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -223,14 +223,6 @@ void tick_nohz_stop_sched_tick(int inidle)
cpu = smp_processor_id();
ts = &per_cpu(tick_cpu_sched, cpu);

- /*
- * Call to tick_nohz_start_idle stops the last_update_time from being
- * updated. Thus, it must not be called in the event we are called from
- * irq_exit() with the prior state different than idle.
- */
- if (!inidle && !ts->inidle)
- goto end;
now = tick_nohz_start_idle(ts);


Attachment: nohz-hack.png
Description: PNG image