[PATCH v2] watchdog: nohz: don't run watchdog on nohz_full cores

From: Chris Metcalf
Date: Mon Mar 30 2015 - 15:33:14 EST


On 03/30/2015 03:12 PM, Don Zickus wrote:
On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetcalf@xxxxxxxxxx wrote:
From: Chris Metcalf <cmetcalf@xxxxxxxxxx>

Running watchdog can be a helpful debugging feature on regular
cores, but it's incompatible with nohz_full, since it forces
regular scheduling events. Accordingly, just exit out immediately
from any nohz_full core.

An alternate approach would be to add a flags field or function to
smp_hotplug_thread to control on which cores the percpu threads
are created, but it wasn't clear that much mechanism was useful.
Hi Chris,

It seems like the correct solution would be to hook into the idle_loop
somehow. If the cpu is idle, then it seems unlikely that a lockup could
occur.

With nohz_full, though, the cpu might be running userspace code
with the intention of keeping kernel ticks disabled. Even returning
to kernel mode to try to figure out if we "should" be running the
watchdog on a given core will induce exactly the kind of interrupts
that nohz_full is designed to prevent.

My assumption is generally that nohz_full cores don't spend a lot of
time in the kernel anyway, as they are optimized for user space.

I guess you could imagine doing something per-cpu on the nohz_full
cores where we effectively call watchdog_disable() whenever a
nohz_full core enters userspace, and watchdog_enable() whenever it
enters the kernel. We could add some per-cpu state in the watchdog
code to track whether that core was currently enabled or disabled
to avoid double-enabling or double-disabling. I would think
context_tracking_user_exit()/_enter() would be the place to do this.

This feels like a lot of overhead, potentially. Thoughts?

My fear with this apporach is a lockup would occur on the nohz cpu and it
would go undetected because that cpu is disabled. Further no printk is
thrown out to even indicate a cpu is disabled making it more difficult to
debug.

Assuming we stick with my simple "exit the watchdog thread" model,
we probably likely wouldn't want to warn for every nohz_full core, but a
one-shot message makes sense, e.g. just "watchdog: disabled on
nohz_full cores".

Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
---
kernel/watchdog.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3174bf8e3538..8a46d9d8a66f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -19,6 +19,7 @@
#include <linux/sysctl.h>
#include <linux/smpboot.h>
#include <linux/sched/rt.h>
+#include <linux/tick.h>
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
@@ -431,6 +432,10 @@ static void watchdog_enable(unsigned int cpu)
hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hrtimer->function = watchdog_timer_fn;
+ /* nohz_full cpus do not do watchdog checking. */
+ if (tick_nohz_full_cpu(cpu))
+ do_exit(0);
+
/* Enable the perf event */
watchdog_nmi_enable(cpu);
--
2.1.2


--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/