Re: [PATCH v2] watchdog: nohz: don't run watchdog on nohz_full cores

From: Don Zickus
Date: Mon Mar 30 2015 - 16:02:43 EST


On Mon, Mar 30, 2015 at 03:32:55PM -0400, Chris Metcalf wrote:
> On 03/30/2015 03:12 PM, Don Zickus wrote:
> >On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetcalf@xxxxxxxxxx wrote:
> >>From: Chris Metcalf <cmetcalf@xxxxxxxxxx>
> >>
> >>Running watchdog can be a helpful debugging feature on regular
> >>cores, but it's incompatible with nohz_full, since it forces
> >>regular scheduling events. Accordingly, just exit out immediately
> >>from any nohz_full core.
> >>
> >>An alternate approach would be to add a flags field or function to
> >>smp_hotplug_thread to control on which cores the percpu threads
> >>are created, but it wasn't clear that much mechanism was useful.
> >Hi Chris,
> >
> >It seems like the correct solution would be to hook into the idle_loop
> >somehow. If the cpu is idle, then it seems unlikely that a lockup could
> >occur.
>
> With nohz_full, though, the cpu might be running userspace code
> with the intention of keeping kernel ticks disabled. Even returning
> to kernel mode to try to figure out if we "should" be running the
> watchdog on a given core will induce exactly the kind of interrupts
> that nohz_full is designed to prevent.
>
> My assumption is generally that nohz_full cores don't spend a lot of
> time in the kernel anyway, as they are optimized for user space.
>
> I guess you could imagine doing something per-cpu on the nohz_full
> cores where we effectively call watchdog_disable() whenever a
> nohz_full core enters userspace, and watchdog_enable() whenever it
> enters the kernel. We could add some per-cpu state in the watchdog
> code to track whether that core was currently enabled or disabled
> to avoid double-enabling or double-disabling. I would think
> context_tracking_user_exit()/_enter() would be the place to do this.
>
> This feels like a lot of overhead, potentially. Thoughts?

A few months ago I might have thought that a reasonable approach. But
recently we have added code to make the watchdog an all or nothing approach
across the system. This might make it difficult to do what you are
suggesting.

I do not know enough about the nohz code to know what the right approach is
here. Perhaps Federic can enlighten me?


Cheers,
Don

>
> >My fear with this apporach is a lockup would occur on the nohz cpu and it
> >would go undetected because that cpu is disabled. Further no printk is
> >thrown out to even indicate a cpu is disabled making it more difficult to
> >debug.
>
> Assuming we stick with my simple "exit the watchdog thread" model,
> we probably likely wouldn't want to warn for every nohz_full core, but a
> one-shot message makes sense, e.g. just "watchdog: disabled on
> nohz_full cores".
>
> >>Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
> >>---
> >> kernel/watchdog.c | 5 +++++
> >> 1 file changed, 5 insertions(+)
> >>
> >>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> >>index 3174bf8e3538..8a46d9d8a66f 100644
> >>--- a/kernel/watchdog.c
> >>+++ b/kernel/watchdog.c
> >>@@ -19,6 +19,7 @@
> >> #include <linux/sysctl.h>
> >> #include <linux/smpboot.h>
> >> #include <linux/sched/rt.h>
> >>+#include <linux/tick.h>
> >> #include <asm/irq_regs.h>
> >> #include <linux/kvm_para.h>
> >>@@ -431,6 +432,10 @@ static void watchdog_enable(unsigned int cpu)
> >> hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> >> hrtimer->function = watchdog_timer_fn;
> >>+ /* nohz_full cpus do not do watchdog checking. */
> >>+ if (tick_nohz_full_cpu(cpu))
> >>+ do_exit(0);
> >>+
> >> /* Enable the perf event */
> >> watchdog_nmi_enable(cpu);
> >>--
> >>2.1.2
> >>
>
> --
> Chris Metcalf, EZChip Semiconductor
> http://www.ezchip.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/