Re: Hardware Error Kernel Mini-Summit

From: Andi Kleen
Date: Tue Jun 15 2010 - 07:41:44 EST


On Tue, Jun 15, 2010 at 10:06:33AM +0200, Nils Carlson wrote:

Hi Nils,

> Could you maybe provide some references on how DIMM layout
> could be read from DMI? I can't find anything nearly this specific,
> or is it something we're expecting to happen in future BIOS's?

The hardware (or BIOS) tells you the DIMM. You read the DIMMs
from DMI and map them using the locators. The locator strings
are not standardized, but there are not too many different
formats around, so they can be implemented.

Again this does not give you full layout, but it gives
you a "path to a DIMM" and a DIMM locator.

An alternative is also to use the ACPI based reporting
mechanism which is needed on some system. In this case
the CPER gives you a reference to the DMI object of the DIMM.

In principle DMI has more information (arrays, ranges etc.)
but in my experience that is not strong enough to really find
the DIMM on modern systems. You need hardware or BIOS help for this.

This is implemented in mcelog today.

>
> Also, there would probably need to be some standard describing
> different DIMM layouts in general, though maybe such a thing exists.

I don't think the goal is to have full DIMM layout. This will
never replace your schemantics.

The goal is to find which DIMM has a problem. So have a path
and a locator. The path may tell you some additional information
(e.g. channel), but that's hard to generalize.

>
> In other words, there would be have to be some way of ascertaining
> that the info you read from DMI is sufficient to decode MCEs so that
> a faulting DIMM can be identified. In an ideal world, this could
> be tested by some simple tool that could be run by the BIOS writers
> to test that they're providing the OS with sufficient info.

That's difficult in a general way, you will probably always
need some system specific test plan.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/