Re: Kernel panic: Machine check exception

From: Nix
Date: Sat Nov 19 2005 - 17:24:22 EST


On 19 Nov 2005, Alan Cox stipulated:
> On Sad, 2005-11-19 at 12:54 -0800, Avuton Olrich wrote:
>> Is there a good way to narrow it down? I guess running a badmem
>> program would be good to start with, otherwise ...(?).
>
> A memory test may be worth doing but most machine checks indicate the
> fault is more serious than bad memory.

Some of them are certainly not very serious in their effects. I get this
persistently, once every couple of months, on one of my machines (an
Athlon 4 with 768Mb of ECC RAM):

kernel: MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
kernel: Bank 2: 940040000000017a

and I think it *is* a single slightly wobbly bit (the wobble being too
slight for memtest to find).

Nothing discernible has ever gone wrong as a result of that, right down
to repeated GCC enable-checking bootstrap-and-tests completing without
error (well, without any more than the expected XFAILs).

(I'm just *assuming* the `Bank 2' in this message refers to a bank of
RAM. Doubtless this assumption will now be shown to be utterly wrong and
a sign of terminal foolishness on my part... ;) )

--
`Y'know, London's nice at this time of year. If you like your cities
freezing cold and full of surly gits.' --- David Damerell

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/