Re: Hardware Error Kernel Mini-Summit

From: Russ Anderson
Date: Mon May 24 2010 - 11:55:30 EST


On Wed, May 19, 2010 at 10:30:17AM -0700, Tony Luck wrote:
>
> We are still in the dark ages for memory errors where the OS
> is expected to look at all the errors and figure out whether they
> represent any kind of meaningful pattern that requires some
> action to replace h/w components.

ia64 is good at detecting & recovering from memory uncorrectable
errors. x86 is significantly behind, due to historically not
being able to recover from uncorrectable memory errors.

ia64 had the Intel defined MCA Spec which defined the interaction
between SAL and the kernel. x86 does not have a similar well
defined way of how errors should be handled. It would be
good to agree on how the errors should be handled.

> -Tony

--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@xxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/