Re: CPU Context corruption

From: Willy Tarreau
Date: Sun Sep 12 2004 - 01:38:26 EST


On Sat, Sep 11, 2004 at 01:40:55PM +0100, Alan Cox wrote:
> On Sad, 2004-09-11 at 12:19, Nigel Kukard wrote:
> > What does this error mean?
> >
> >
> > CPU 0: Machine Check Exception: 0000000000000004
> > Bank 0: 820000001040080F
> >
> >
> > I have a Matsonic 9097c motherboard, 2.4Ghz prescott celeron cpu. This
> > error seems to be random. We have replaced the motherboard & cpu to no
> > avail.
>
> It normally indicates a hardware problem. The precise meaning of all the
> bits is in the Intel chip docs (volume 3). If you've swapped the
> mainboard/cpu it might just be bad RAM.

He can also get precise info with Dave Jones' parsemce tool :

http://www.kernel.org/pub/linux/kernel/people/davej/tools/

It currently says :

Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(0): 820000001040080f @ 0
External tag parity error
CPU state corrupt. Restart not possible
Bus and interconnect error
Participation: Local processor originated request
Timeout: Request did not timeout
Request: Generic error
Transaction type : Invalid
Memory/IO : Other

Since it says it's neither memory nor I/O, I think it might be related to
a PCI parity error with some card, either during transfers or config access.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/