Re: Hardware Error Kernel Mini-Summit

From: Tony Luck
Date: Tue Jun 15 2010 - 18:33:58 EST

On Mon, Jun 14, 2010 at 11:56 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> The way I envision it to working is that a abstracted dimm interface
> (or edac2 or whatever you want to call it) can be fed from any reasonable
> DIMM layout driver. This could be either DMI on x86 or some other
> driver. There would be nothing really x86 specific about that.

You could go one stage further and make DIMMs just one example of
a field replaceable unit. So the "error analysis subsystem" would keep track
of errors reported by any component (cpu, DIMM, I/O card, fan, power
supply, disk, ...). Each category could have different "X errors per Y
interval" parameter that made sense for it.

