Re: ECC error in 2.5.64 + some patches

From: Chris Wedgwood (cw@f00f.org)
Date: Thu Mar 27 2003 - 12:00:39 EST


On Thu, Mar 27, 2003 at 08:02:20AM -0800, Larry McVoy wrote:

> My guess is that this means there was a memory error and ECC fixed
> it.

Nope.

There is an ecc driver for RAM and you'll be able to detect these
using that. RAM ECC errors in my experience don't cause MCEs, usually
the CPU never notices.

> The only problem is that I'm reasonably sure that there isn't ECC on
> these DIMMs.

Dump the SPD and you can check... usually the BIOS will tell you too.

> Does anyone have the table of error codes to explanations? Google
> didn't find anything for this one.

as someone else pointed our, parsemce is what you want

> Message from syslogd@slovax at Thu Mar 27 05:53:49 2003 ...
> slovax kernel: Bank 1: 9000000000000151

Status: (9000000000000151) Restart IP valid.

*Exactly* what this means I don't know --- but I'm guessing the CPU is
overheating. Check fans, air-flow, etc. and see if that helps. So
far whenever I've seen the above problem it's *ALWAYS* been related to
the CPU getting too hot.

  --cw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Mar 31 2003 - 22:00:28 EST