Re: NMI watchdog + NOHZ question

From: David Miller
Date: Wed Jun 24 2009 - 06:32:43 EST


From: Andi Kleen <andi@xxxxxxxxxxxxxx>
Date: Wed, 24 Jun 2009 12:23:25 +0200

>> And similarly to sparc64, if that 5+ second qla2xxx interrupt
>> sequence happens after the tick_nohz_stop_sched_tick() call
>> we can run into the same situation.
>
> Yes it would be probably safer to do the tick disabling with
> interrupts off already.

That only makes sense if you're really putting the cpu to sleep
until an interrupt or similar happens.

Here in this sparc64 case I'm not, I just spin waiting for the exit
from cpu_idle() conditions.

I'll think more about how I'll handle this. It's at least a relief to
understand exactly what causes this issue now :-)

> These days NMI watchdog is not used much on x86 anymore because it's
> default off, so probably people never noticed that.

I really didn't want to provide the feature that way on sparc64 which
is why I made it on by default. It would be interesting to reconsider
x86's default, perhaps even only on a trial basis in -next.

It's so useful, and in the short time sparc64 has had this NMI code I
can count at least 8 bugs I've fixed only because it was on all the
time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/