[RFC PATCH 0/4] timers: Timer migration

From: Arun R Bharadwaj
Date: Thu Oct 16 2008 - 07:43:01 EST


(This is merely for discussion purposes and not for inclusion).

This is a re-work of the earlier patch which i had sent.
Link to the earlier patch - http://lkml.org/lkml/2008/9/16/40
I have made the following changes from my previous patch:

1) Created a new framework for identifying cpu-pinned hrtimers,so
that such hrtimers are ignored during migration of timers.

2)A better sysfs interface which allows you to echo a target cpu
number to the per-cpu sysfs entry and all timers are migrated
to that cpu, instead of choosing cpu0 by default.

This patch set is based on the kernel version 2.6.27.

Here's a brief introduction as to why we need timer migration.

An idle cpu on which device drivers have initialized timers, or any
timer that is (re)queued from a softirq context has to be
frequently woken up to service the timers. So, consolidation of timers
onto a fewer number of cpus is important. Migration of timers from
idle cpus onto lesser idle cpus is necessary. Currently, timers are
migrated during the cpu offline operation. However cpu-hotplug for the
sake of idle system power management is too heavy. So, this patch
implements a lightweight timer migration framework.

Also, in machines with large number of CPUs, when utilization is not
high enough, but is not 0% either, we would want to consolidate all
the system activity to as fewer number of packages as possible.

a) Interrupts are usually re-routed using the power-aware irqbalance

b) For tasks, we have hooks in the scheduler which can consolidate tasks to
a fewer number of CPUs.

c) The remaining part of the system activity is the timers, which can be
queued from a task or a softirq context.

c-1)If they're queued from the task context, then they migrate whenever the
task is migrated.

c-2) However, if they're requeued from a softirq context, then it's not
possible to currently migrate them unless the CPU is offlined.

Hence, the need for a minimalistic framework which allows to
migrate the last remaining system activity from the last few
idle-but-serving-timers CPUs of an otherwise idle package, to a package
which is already having some amount of system activity on them.

Also, this kind of a framework will be helpful for a certain class of
applications like the High Performance (HPC) applications, where we
would want to restrict the system housekeeping activities to as fewer
number of CPUs as possible, in order to minimize the jitter caused by
these housekeeping activities.

Lastly, the algorithm which decides to which cpu the timer should be
migrated to should be conservative in the sense that, it should
migrate only if the target cpu is sufficiently busy so that it is not
woken up from an idle state. In that case where even target cpu is
idle, the penalty of wake up would be same on either of the cpus.

Tests carried out:

a) I have tested this patch by stressing the system using a script
which continuously hotplug-add and removes the cpus.

b) Also ran kernbench. The kernbench results with and without my patches
on were fairly similar.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/