[RFC GIT PULL] printk: Full dynticks support for 3.8

From: Frederic Weisbecker
Date: Mon Dec 17 2012 - 09:41:57 EST


We are currently working on extending the dynticks mode to broader contexts than just idle.
Under some conditions on a busy CPU, the tick can be avoided (no need of preemption for one
task running, no need of RCU state machine maintainance in userspace, etc...).

The most popular application of this is the implementation of CPU isolation. On HPC
workloads, where people run one task per-CPU in order to maximize the CPU performances,
the kernel sets itself too much on the way with these often unnecessary interrupts.

The result is a performance loss due to stolen CPU time and cache trashing of
the userspace workset.

Now CPU isolation is the most famous user. I expect more. For example we should be able
to avoid the tick when we run in guest mode. And more generally this may be a win
for most CPU-bound workloads.

So in order to implement this full dynticks mode, we need to find alternatives to
handle the many maintainance operations performed periodically and turn them to
more one-shot event driven solutions.

printk() is part of the problem. It must be safely callable from most places and for
that purpose it performs an asynchronous wake up of the readers by probing on the tick for
pending messages and readers through printk_tick().

Of course if we use printk while the tick is stopped, the pending readers may not be woken
up for a while. So a solution to make printk() working even if the CPU is in dynticks mode
is to use the irq_work subsystem. This subsystem is typically able to fire self-IPIs.
So when printk() is called, it now enqueues an irq_work that does the asynchronous wakeup:

* If the tick is stopped, it raises a self-IPI
* If the tick is running periodically then don't fire a self-IPI but wait for the next tick
to handle that instead (irq work probes on the timer tick). This avoids self-IPIs storm in
case of frequent printk() in short periods of time.

I know this is a sensitive area. We want printk() to stay minimal and not rely too much
on other subsystems that add complications and that may use printk themselves.
That's why we chose irq_work because:

- It's pretty small and self-contained
- It's lockless
- It handles most recursivity cases (if it uses printk() itself from the IPI path, this won't
fire another IPI)

But because it's sensitive, I'm proposing it as an RFC pull request.

So if you're ok with that, please pull from:


HEAD: 74876a98a87a115254b3a66a14b27320b7f0acaa "printk: Wake up klogd using irq_work"

It has been in linux-next.


Support for printk in dynticks mode:

* Fix two races in irq work claiming

* Generalize irq_work support to all archs

* Don't stop tick with irq works pending. This
fix is generally useful and concerns archs that
can't raise self IPIs.

* Flush irq works before CPU offlining.

* Introduce "lazy" irq works that can wait for the
next tick to be executed, unless it's stopped.

* Implement klogd wake up using irq work. This
removes the ad-hoc printk_tick()/printk_needs_cpu()
hooks and make it working even in dynticks mode.

Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>

Frederic Weisbecker (7):
irq_work: Fix racy IRQ_WORK_BUSY flag setting
irq_work: Fix racy check on work pending flag
irq_work: Remove CONFIG_HAVE_IRQ_WORK
nohz: Add API to check tick state
irq_work: Don't stop the tick with pending works
irq_work: Make self-IPIs optable
printk: Wake up klogd using irq_work

Steven Rostedt (2):
irq_work: Flush work on CPU_DYING
irq_work: Warn if there's still work on cpu_down

arch/alpha/Kconfig | 1 -
arch/arm/Kconfig | 1 -
arch/arm64/Kconfig | 1 -
arch/blackfin/Kconfig | 1 -
arch/frv/Kconfig | 1 -
arch/hexagon/Kconfig | 1 -
arch/mips/Kconfig | 1 -
arch/parisc/Kconfig | 1 -
arch/powerpc/Kconfig | 1 -
arch/s390/Kconfig | 1 -
arch/sh/Kconfig | 1 -
arch/sparc/Kconfig | 1 -
arch/x86/Kconfig | 1 -
drivers/staging/iio/trigger/Kconfig | 1 -
include/linux/irq_work.h | 20 +++++
include/linux/printk.h | 3 -
include/linux/tick.h | 17 ++++-
init/Kconfig | 5 +-
kernel/irq_work.c | 131 ++++++++++++++++++++++++++--------
kernel/printk.c | 36 +++++----
kernel/time/tick-sched.c | 7 +-
kernel/timer.c | 1 -
22 files changed, 161 insertions(+), 73 deletions(-)


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/