[ANNOUNCE] 3.7-nohz1

From: Frederic Weisbecker
Date: Thu Dec 20 2012 - 13:39:25 EST


So this is a new version of the nohz cpusets based on 3.7, except it's not using
cpusets anymore and I actually based it on the middle of the 3.8 merge window
in order to get latest upstream full dynticks preparatory work: cputime cleanups,
RCU user mode, context tracking subsystem, nohz code consolidation, ...

So the big changes since the last nohz cpuset release are:

* printk now uses irq work so it doesn't rely on the tick anymore (provided
your arch implements irq work with IPIs or alike). This chunk has been proposed
for the 3.8 merge window: https://lkml.org/lkml/2012/12/17/177
May be Linus will pull, may be not. We'll see. In any case I've included it in this tree
but I'm not reposting this part of the patchset to avoid spamming you.

* cputime doesn't rely on IPIs anymore. Now the reader does a special computation to
remotely get the tickless cputime.

* No more cpusets interface. Paul McKenney suggested me to start with a boot time
kernel parameter to define the full dynticks cpumask. And he was totally right, it
makes the code much more simple. That's a good way to start and to make the mainlining
easier. We can still add a runtime configuration later if necessary.

* Now there is always a CPU handling the timekeeping. This can be further optimized
and more power-friendly, I really did something simple-stupid. I guess we'll try to get
that into a better shape with Hakan. But at least the timekeeping now works.

* It uses the new RCU callbacks offlining feature. This way a full dynticks CPU doesn't
need to keep the tick to handle local callbacks. This is still very experimental though.

* No more specific IPI vector for full dynticks. We just use the scheduler ipi.

The branch is:


There is still quite some work to do.

== How to use? ==


You always need at least one timekeeping CPU.

Let's imagine you have 4 CPUs. We keep the CPU 0 to offline RCU callbacks there and to
handle the timekeeping. We set the rest as full dynticks. So you need the following kernel

rcu_nocbs=1-3 full_nohz=1-3

(Note rcu_nocbs value must always be the same as full_nohz).

Now if you want proper isolation you need to:

* Migrate your processes adequately
* Migrate your irqs to CPU 0
* Migrate the RCU nocb threads to CPU 0. Example with the above configuration:

for p in $(ps -o pid= -C rcuo1,rcuo2,rcuo3)
taskset -cp 0 $p

Then run what you want on the full dynticks CPUs. For best results, run 1 task
per CPU, mostly in userspace and mostly CPU bound (otherwise more IO = more kernel
mode execution = more chances to get IPIs, tick restarted, workqueues, kthreads, etc...)

This page contains a good reminder for those interested in CPU isolation: https://github.com/gby/linux/wiki

But keep in mind that my tree is not yet ready for serious production.

Happy Christmas, new year or whatever end of the world.

Frederic Weisbecker (32):
irq_work: Fix racy IRQ_WORK_BUSY flag setting
irq_work: Fix racy check on work pending flag
irq_work: Remove CONFIG_HAVE_IRQ_WORK
nohz: Add API to check tick state
irq_work: Don't stop the tick with pending works
irq_work: Make self-IPIs optable
printk: Wake up klogd using irq_work
Merge branch 'nohz/printk-v8' into 3.7-nohz1-stage
context_tracking: Add comments on interface and internals
cputime: Generic on-demand virtual cputime accounting
cputime: Allow dynamic switch between tick/virtual based cputime accounting
cputime: Use accessors to read task cputime stats
cputime: Safely read cputime of full dynticks CPUs
nohz: Basic full dynticks interface
nohz: Assign timekeeping duty to a non-full-nohz CPU
nohz: Trace timekeeping update
nohz: Wake up full dynticks CPUs when a timer gets enqueued
rcu: Restart the tick on non-responding full dynticks CPUs
sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
sched: Update rq clock on nohz CPU before migrating tasks
sched: Update rq clock on nohz CPU before setting fair group shares
sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
sched: Update rq clock earlier in unthrottle_cfs_rq
sched: Update clock of nohz busiest rq before balancing
sched: Update rq clock before idle balancing
sched: Update nohz rq clock before searching busiest group on load balancing
nohz: Move nohz load balancer selection into idle logic
nohz: Full dynticks mode
nohz: Only stop the tick on RCU nocb CPUs
nohz: Don't turn off the tick if rcu needs it
nohz: Don't stop the tick if posix cpu timers are running
nohz: Add some tracing

Steven Rostedt (2):
irq_work: Flush work on CPU_DYING
irq_work: Warn if there's still work on cpu_down

arch/alpha/Kconfig | 1 -
arch/alpha/kernel/osf_sys.c | 6 +-
arch/arm/Kconfig | 1 -
arch/arm64/Kconfig | 1 -
arch/blackfin/Kconfig | 1 -
arch/frv/Kconfig | 1 -
arch/hexagon/Kconfig | 1 -
arch/mips/Kconfig | 1 -
arch/parisc/Kconfig | 1 -
arch/powerpc/Kconfig | 1 -
arch/s390/Kconfig | 1 -
arch/s390/kernel/vtime.c | 4 +-
arch/sh/Kconfig | 1 -
arch/sparc/Kconfig | 1 -
arch/x86/Kconfig | 1 -
arch/x86/kernel/apm_32.c | 11 +-
drivers/isdn/mISDN/stack.c | 7 +-
drivers/staging/iio/trigger/Kconfig | 1 -
fs/binfmt_elf.c | 8 +-
fs/binfmt_elf_fdpic.c | 7 +-
include/asm-generic/cputime.h | 1 +
include/linux/context_tracking.h | 28 +++++
include/linux/hardirq.h | 4 +-
include/linux/init_task.h | 9 ++
include/linux/irq_work.h | 20 +++
include/linux/kernel_stat.h | 2 +-
include/linux/posix-timers.h | 1 +
include/linux/printk.h | 3 -
include/linux/rcupdate.h | 8 ++
include/linux/sched.h | 48 +++++++-
include/linux/tick.h | 26 ++++-
include/linux/vtime.h | 47 +++++---
init/Kconfig | 22 +++-
kernel/acct.c | 6 +-
kernel/context_tracking.c | 91 +++++++++++----
kernel/cpu.c | 4 +-
kernel/delayacct.c | 7 +-
kernel/exit.c | 6 +-
kernel/fork.c | 8 +-
kernel/irq_work.c | 131 ++++++++++++++++-----
kernel/posix-cpu-timers.c | 39 +++++-
kernel/printk.c | 36 +++---
kernel/rcutree.c | 19 +++-
kernel/rcutree_plugin.h | 13 +--
kernel/sched/core.c | 69 +++++++++++-
kernel/sched/cputime.c | 222 ++++++++++++++++++++++++++++++-----
kernel/sched/fair.c | 42 +++++++-
kernel/sched/sched.h | 15 +++
kernel/signal.c | 12 ++-
kernel/softirq.c | 11 +-
kernel/time/Kconfig | 9 ++
kernel/time/tick-broadcast.c | 3 +-
kernel/time/tick-common.c | 5 +-
kernel/time/tick-sched.c | 142 ++++++++++++++++++++---
kernel/timer.c | 3 +-
kernel/tsacct.c | 19 ++-
56 files changed, 955 insertions(+), 233 deletions(-)
