Parity/ECC Error handling in Linux

From: Jean Wolter (jean.wolter@inf.tu-dresden.de)
Date: Tue May 30 2000 - 11:25:52 EST


Hello,

some days ago I reported an oops in find_buffer and several people
recommended to check our memory, since most of the times the oopses
vanish if the memory is replaced.

But there were several reasons why we didn't think we have a problem
with faulty memory.

1. We have never seen one of the famous gcc Sig11 problems (and we use
the machine heavely for kernel compilations)

2. We have an AMI Goliath board with 4 PPro, Orion chipset and ECC
memory (ECC is enabled in BIOS). And we have never seen a report in
the log files reporting corrected 1 bit errors or uncorrectable 2 bit
errors.

But now I have checked the source and didn't see any code which
handles memory errors. AFAIK the P6 uses the machine check exception
(mce) to report problems and the machine check architecture (mca)
provides information about the faulty subsystem. Is there any reason
that Linux doesn't use MCE and MCA? Are memory errors still reported
using NMI?

Or is it just that nobody implemented it in Linux? Is anybody working
on it? Is parity/ECC handling not relevant?

Jean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:24 EST