Re: Hardware Error Kernel Mini-Summit

From: Nils Carlson
Date: Tue Jun 15 2010 - 04:08:02 EST


On Tue, 15 Jun 2010, Andi Kleen wrote:

> On Mon, Jun 14, 2010 at 04:46:40PM -0700, Doug Thompson wrote:
>
> Hi Doug,
>
> >
> > Maybe I didn't see it covered (or I missed it), but EDAC is used on more than just x86 based machines, though they are the majority by volume. We should have an abstraction that covers all the archs, like we do with other subsystems of Linux.
>
> The way I envision it to working is that a abstracted dimm interface
> (or edac2 or whatever you want to call it) can be fed from any reasonable
> DIMM layout driver. This could be either DMI on x86 or some other
> driver. There would be nothing really x86 specific about that.

Could you maybe provide some references on how DIMM layout
could be read from DMI? I can't find anything nearly this specific,
or is it something we're expecting to happen in future BIOS's?

Also, there would probably need to be some standard describing
different DIMM layouts in general, though maybe such a thing exists.

In other words, there would be have to be some way of ascertaining
that the info you read from DMI is sufficient to decode MCEs so that
a faulting DIMM can be identified. In an ideal world, this could
be tested by some simple tool that could be run by the BIOS writers
to test that they're providing the OS with sufficient info.

/Nils
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/