Re: Linux and memory ECC on i386

Craig Milo Rogers (rogers@isi.edu)
Sat, 07 Sep 96 12:22:32 PDT


>Only in the case that the ECC hardware detects an uncorrectable error
>system software gets involved. And at that point it's usually to late
>anyway ...

On really well-designed ECC systems, the hardware records soft
errors, the operating system logs them to disk, a support engineer
gets a report identifying the board and chip that's going bad when
there's a repeated pattern, and someone goes out on site to swap out
the failing component the next day.

Perhaps Linux 2.1 could klog soft ECC errors on hardware
platforms that support it (maybe 2.0 already does so!)? The rest
is merely a Perl script or two.

Craig Milo Rogers