Re: Linux & ECC memory

Eric Horst (erich@cac.washington.edu)
Thu, 14 Nov 1996 22:35:32 -0800 (PST)


> A more subtle issue is whether the ECC memory controller could report
> instances where ECC detection and successful correction took place. It
> would seem to be useful to provide a way for the OS to recognize that
> non-fatal memory errors have occured, even though they were repaired.
>

This is actually the more interesting issue. It isn't so important that
the kernel know how to work around a bad spot that has been ECC corrected.
It is much more interesting to me to have the kernel tell me that a
correction was made (soft error) so that I have the opportunity to replace
it before it degrades and a hard error occurs.

This is really the whole advantage of ECC. It saves you from those pesky
one bit errors and reports them so you can act before it worsens.

So my question would be "Does Linux log ECC corrections?". And from the
responses I'd infer that one ore more of the following applies: 1) nobody
really knows; 2) the boards are too new yet and it may come; 3) the
boards are too brain dead to have a way to report this info and it will
never come.

I'd be interested in an informed answer as we've got 45 new linux server
boxes on order with ECC memory spec'd. Lack of kernel knowledge of ECC
won't hamper us but this support would sure be nice.

--Eric