On Sun, 11 Mar 2001, Andrew Morton wrote:
> Sorry, this doesn't look right. Are you sure you booted with
> `nmi_watchdog=1'? It was turned off by default in -ac18.
of course i did ...
> Two things:
>
> - CPU A could be doing the SYSRQ printing, while
> CPU B is spinning on a lock which CPU A holds. The
> NMI watchdog will then whack CPU B. So touch_nmi_watchdog()
> needs to touch *all* CPUs. (kbd_controller_lock, for example).
yep, agreed.
> - We need to touch the NMI more than once during the
> SYSRQ-T output - five seconds isn't enough.
>
> The correctest way is, I think, to touch_nmi() in
> rs285_console_write(), lp_console_write() and
> serial_console_write().
nope:
> We _could_ just touch it in show_state(), but that means
> we still get whacked if we do a lot of printk()s with interrupts
> disabled from some random place in the kernel.
exactly, and that is a feature. We want to find all those places, because
disabling IRQs for too long can cause problems in unrelated kernel code.
SysRq-T is a special case so touch_nmi() is justified in that and only
that case. The NMI watchdog is something that gives security, and we want
to be very conservative disabling its effect.
[i've attached nmi-watchdog-2.4.2-A2 (against -ac18) which adds your fix
to clear all alert counters in touch_nmi_watchdog().]
Ingo
This archive was generated by hypermail 2b29 : Thu Mar 15 2001 - 21:00:12 EST