Re: [PATCH] Add mem_nmi_panic enable system to panic on hard error

From: Dave Jones
Date: Tue Dec 13 2005 - 01:48:54 EST


On Tue, Dec 13, 2005 at 03:38:16PM +0900, Hidetoshi Seto wrote:
> Some x86 server fires NMI with reason 0x80 up if a parity error
> occurs on its PCI-bus system or DIMM module.

Hmm, are you sure this isn't a bios error misconfiguring
some northbridge register perhaps ? Some chipsets offer
such reporting as a feature. Could be your server has this
on by default.

(I believe the EDAC code has also triggered similar cases
on certain cards which is why it too disables this checking
by default).

> However, such NMI cannot stop the system and data pollution,
> because the NMI is _not_ "unknown", through unknown_nmi_panic
> can panic the system on NMI which has no reason bits.
>
> This patch adds "mem_nmi_panic" sysctl parameter that enable
> system to switch its action on such NMI originated in a hard error.
> Also it seems that x86_86 has same situation and problem,
> so this is a couple of patch for 2 archs, i386 and x86_64.

Why not make this automatic based on dmi strings, instead of
making the user guess that he needs to pass obscure command
line options?

The sysctl seems pointless too. If this is needed at all,
why would you ever want to turn it off ?

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/