MCE Erros

From: Thomas Glanzmann
Date: Thu Mar 22 2007 - 15:51:04 EST


Hello,
I have two Dual Opteron Machines where I get two MCE errors on. The
first one is:

MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC edc587de6e99
ADDR 1001a0000
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0

I see this error exactly 8 times. What does 'GART' mean?

And here is another one another box:

MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC f23151075b21d
ADDR b8898250
Northbridge Chipkill ECC error
Chipkill ECC syndrome = f858
bit32 = err cpu0
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS d42c4001f8080813 MCGSTATUS 0

How do I identify the broken Memory Modules?

Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/