Re: Request for MCE decode (AMD Barcelona, fam 10h)

From: Jeroen van Rijn
Date: Sat Sep 06 2008 - 23:16:19 EST


On Sun, Sep 7, 2008 at 4:32 AM, Tony Vroon <tony@xxxxxxxxx> wrote:
> On a Tyan-based system with intermittent but persistent instability, I
> have finally received a message that something might actually be wrong
> in hardware. Could you decode:
>
> MCE 0
> HARDWARE ERROR. This is *NOT* a software problem!
> Please contact your hardware vendor
> CPU 0 BANK 4 MISC c000000001000000
> STATUS fa00002000020c0f MCGSTATUS 0
> MCE 1
> HARDWARE ERROR. This is *NOT* a software problem!
> Please contact your hardware vendor
> CPU 4 BANK 4 MISC c000000001000000
> STATUS fa00000000070f0f MCGSTATUS 0

Hi Tony,

Not easily, and it's too late to parse
arch/x86/kernel/cpu/mcheck/mce_64.c and find out what it means before
I nod off. Still, before I sign off, have you tried running "mcelog
--ascii"? It needs to be run on the machine the check occured on. It
might give you something to go on before the cavalry arrives.

Best regards,
Jeroen.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/