RE: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

From: Luck, Tony
Date: Wed Feb 29 2012 - 12:11:32 EST


>> - on Nehalem, the MCE status register encodes not only the error message; it
>> also encodes the DIMM that generated the error. So, it is possible to
>> completely decode the error on userspace, using only the MCE registers.
>
> Well, depending on what Tony wants to do there, either decode the error
> in the kernel and pass it on with the 'msg' arg or do the whole decoding
> in userspace.

For best results - we should decode right away in the kernel. Decoding later
requires that we carry a lot of additional information about the system
configuration at the time of the error. Consider the case of a hard error
(either fatal or recoverable). If the system reboots, then the DIMM
with the error should fail self test - and thus be mapped out of the system.
If the error analyzer doesn't realize that this has happened, it will be
very confused. Even if it does notice - the Sandy bridge decoder won't be
able to check that the right DIMM was mapped out (since the configuration
registers it reads to map addresses to DIMMS will now be set for the new
configuration, with different mappings).

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/