Re: [RFC 5/6] x86, NMI, Add support to notify hardware error withunknown NMI

From: Don Zickus
Date: Fri Sep 10 2010 - 12:02:30 EST


> @@ -349,6 +351,14 @@ io_check_error(unsigned char reason, str
> static notrace __kprobes void
> unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> {
> + /*
> + * On some platforms, hardware errors may be notified via
> + * unknown NMI
> + */
> + if (unknown_nmi_for_hwerr)
> + panic("NMI for hardware error without error record: "
> + "Not continuing");
> +
> #ifdef CONFIG_MCA

I'm not sure I agree with this. I still see PCI SERR's not coming in
through port 0x61 and get routed to unknown_nmi_error. Not sure we should
just assume that it is an APEI/HEST error and panic the box.

Also all the perf problems we have seen recently have been going through
that path as we slowly try to figure out why we are not catching those
unknown nmis.

I am grasping for straws here, but is there a register that APEI/HEST can
poke to see if it generated the NMI?

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/