Re: AMD A10: MCE Instruction Cache Error

From: Alexander Holler
Date: Tue Nov 06 2012 - 06:18:11 EST


Am 06.11.2012 10:10, schrieb Borislav Petkov:
On Sun, Nov 04, 2012 at 06:19:32PM +0100, Alexander Holler wrote:
I was remotely logged in and there aren't that many faults which
lead to complete stand still of hw (no reset).

Right, can you retry triggering the freeze without the fglrx driver?
Simply remove it completely so that even the possibility to load it is
not there.

Will do. But I don't think it is fglrx. I'm using it since several years (just with an external graphics card before) and never had a problem with it. Besides that, during the hangs nothing on the display happened, I was logged out and just had a remote ssh session on.

But as you said I can't know, the only thing I know is that a box
with new mb, memory and apu come to a complete stand still, and
such shortly after I've received an emergency message which told me
that a bit inside the cpu switched unexpected. Adding to that, the
box did the same as what it did while it received the MCE, a backup
from a sata-atached ssd to an usb3-hd which includes compression and
encryption which keeps all cores at work most of the time for several
hours.

So do you get that MCE each time you execute that same workload?

No, up to now the MCE only was visible once. But stressing the box yesterday (with loads of 3 for several hours and such) revealed some other serious failures which all look like the stuff which happens when the cache (or memory) is broken (I don't know how many bits of the cache can be corrected until something else happens or what happens). E.g. the checksum of a backup is wrong, or bzip2 failed with an error which it suggests is because of an HW failure like bad RAM (I've never seen that error from bzip2 before).

I've just done a memory test using memtest86+-4.20 for about 7h (3 complete passes of all 16GB), no errors, so the new memory itself seems to be ok.

I will now to tests with leaving fglrx off.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/