Re: [PATCH] watchdog: Make sure the watchdog thread gets CPU onloaded system

From: Mandeep Singh Baines
Date: Wed Mar 14 2012 - 16:59:57 EST


Don Zickus (dzickus@xxxxxxxxxx) wrote:
> From: Michal Hocko <mhocko@xxxxxxx>
>
> If the system is loaded while hotplugging a CPU we might end up with a bogus
> hardlockup detection. This has been seen during LTP pounder test executed
> in parallel with hotplug test.
>
> The main problem is that enable_watchdog (called when CPU is brought up)
> registers perf event which periodically checks per-cpu counter
> (hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer is fired
> from the kernel thread.
>
> This means that while we already do check for the hard lockup the kernel thread
> might be sitting on the runqueue with zillions of tasks so there is nobody to
> update the value we rely on and so we KABOOM.
>
> Let's fix this by boosting the watchdog thread priority before we wake it up
> rather than when it's already running.
> This still doesn't handle a case where we have the same amount of high prio
> FIFO tasks but that doesn't seem to be common. The current implementation
> doesn't handle that case anyway so this is not worse at least.
>
> Unfortunately, we cannot start perf counter from the watchdog thread because we
> could miss a real lock up and also we cannot start the hrtimer watchdog_enable
> because we there is no way (at least I don't know any) to start a hrtimer from
> a different CPU.
>
> [fix compile issue with param -dcz]
>
> Cc: Ingo Molnar <mingo@xxxxxxx>
> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Mandeep Singh Baines <msb@xxxxxxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>

Reviewed-by: Mandeep Singh Baines <msb@xxxxxxxxxxxx>

> ---
> kernel/watchdog.c | 7 +++----
> 1 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index d117262..6618cde 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -321,11 +321,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> */
> static int watchdog(void *unused)
> {
> - struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> + struct sched_param param = { .sched_priority = 0 };
> struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
>
> - sched_setscheduler(current, SCHED_FIFO, &param);
> -
> /* initialize timestamp */
> __touch_watchdog();
>
> @@ -350,7 +348,6 @@ static int watchdog(void *unused)
> set_current_state(TASK_INTERRUPTIBLE);
> }
> __set_current_state(TASK_RUNNING);
> - param.sched_priority = 0;
> sched_setscheduler(current, SCHED_NORMAL, &param);
> return 0;
> }
> @@ -439,6 +436,7 @@ static int watchdog_enable(int cpu)
>
> /* create the watchdog thread */
> if (!p) {
> + struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> p = kthread_create_on_node(watchdog, NULL, cpu_to_node(cpu), "watchdog/%d", cpu);
> if (IS_ERR(p)) {
> printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu);
> @@ -450,6 +448,7 @@ static int watchdog_enable(int cpu)
> }
> goto out;
> }
> + sched_setscheduler(p, SCHED_FIFO, &param);
> kthread_bind(p, cpu);
> per_cpu(watchdog_touch_ts, cpu) = 0;
> per_cpu(softlockup_watchdog, cpu) = p;
> --
> 1.7.7.6
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/