Re: [PATCH -v2 6/7] x86, NMI, Add support to notify hardware errorwith unknown NMI

From: huang ying
Date: Wed Sep 29 2010 - 04:07:29 EST


On Tue, Sep 28, 2010 at 11:27 PM, Robert Richter <robert.richter@xxxxxxx> wrote:
> On 27.09.10 21:19:21, Huang Ying wrote:
>> On Mon, 2010-09-27 at 21:38 +0800, Robert Richter wrote:
>> > On 27.09.10 08:47:53, huang ying wrote:
>> >
>> > > >> Âarch/x86/kernel/hwerr.c  Â|  55 +++++++++++++++++++++++++++++++++++++++++++++
>> > > >
>> > > > Instead of creating this file the code should be implemented in
>> > > >
>> > > > Âarch/x86/kernel/cpu/intel.c
>> > > >
>> > > > Similar AMD NB code is implemented in amd.c and k8.c.
>> > >
>> > > Why? This file is not vendor specific.
>> >
>> > No, it only implements an Intel specific PCI device, nothing else.
>>
>> You can add AMD specific PCI device here too. We will add more device ID
>> in the future.
>
> I think it is not worth to introduce this file. There is no generic
> code in and we have over places for vendor specific code.

It's not vendor specific code. It is general code. In fact it is a
white list for systems that can treat unknown NMI as hardware error
(no broken hardware to generate unknown NMI). If you can find an
appropriate existing file, I am very glad to put the contents of this
file into it.

>> No. We do NOT catch unknown NMIs for a certain hardware here. We put the
>> code here because we think it is general instead of hardware specific.
>>
>> It should be a general rule to treat unknown NMI as hardware error. But
>> to avoid to confuse some users have broken hardware (which will generate
>> unknown NMI not for hardware error), we use a white list (machines with
>> HEST or workable chipset via PCI ID).
>
> Ok, a white list makes sense. This was not obvious in your
> implementation.

I have some comments in my original code.

+/*
+ * On some platform, hardware errors may be notified via unknown
+ * NMI. These platform is identified via the PCI ID of host bridge.
+ *
+ * The PCI ID of host bridge instead of DMI ID is used, so that the
+ * checking can be done based on the platform instead of motherboard.
+ * This should be simpler and sufficient.
+ */

If you think that is not obvious enough, I will change the comments to
make it more obvious.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/