Re: [PATCH] softlockup detection vs. cpu hotplug

From: Ingo Molnar
Date: Fri Feb 24 2006 - 01:25:10 EST



* Nathan Lynch <ntl@xxxxxxxxx> wrote:

> @@ -86,10 +92,15 @@ static int watchdog(void * __bind_cpu)
> */
> while (!kthread_should_stop()) {
> msleep_interruptible(1000);
> - touch_softlockup_watchdog();
> + /* When our cpu is offlined the watchdog thread can
> + * get migrated before it is stopped.
> + */
> + preempt_disable();
> + if (likely(smp_processor_id() == bind_cpu))
> + touch_softlockup_watchdog();
> + preempt_enable();
> + __set_current_state(TASK_RUNNING);
> }
> - __set_current_state(TASK_RUNNING);
> -
> return 0;
> }
>

the above change is unnecessary: there is absolutely no harm from a
migrated watchdog thread touching another CPU's timestamp for a short
amount of time. [Furtermore, why doesnt the hotplug CPU code use the
kthread_should_stop() mechanism to gracefully stop per-CPU threads,
instead of migrating unsuspecting threads and putting ugly hooks into
every such thread?]

the other fix (to touch before bringing the CPU up) looks OK, but it's
simpler to initialize it via the oneliner below. Andrew, this is what
i'd suggest to do for v2.6.16. [i'll send an updated softirq-rework
patch in the next mail.]

Ingo

------

fix from Nathan Lynch: initialize the softlockup timestamp of freshly
brought up CPUs with jiffies.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

--- kernel/softlockup.c.orig
+++ kernel/softlockup.c
@@ -110,6 +110,7 @@ cpu_callback(struct notifier_block *nfb,
printk("watchdog for %i failed\n", hotcpu);
return NOTIFY_BAD;
}
+ per_cpu(timestamp, hotcpu) = jiffies;
per_cpu(watchdog_task, hotcpu) = p;
kthread_bind(p, hotcpu);
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/