Re: AMD A10: MCE Instruction Cache Error

From: Alexander Holler
Date: Sun Nov 04 2012 - 12:19:49 EST


Am 04.11.2012 16:21, schrieb Borislav Petkov:
On Sat, Nov 03, 2012 at 11:45:25AM +0100, Alexander Holler wrote:
Hmm, exactly that just happened twice in a row. Unfortunately the
screen was already disabled (screen saving mode), so I couldn't see
any message, if there was any. Just a dead box (not overclocked, I
don't do such, I even had enabled the power saving mode in the BIOS,
which seems to mean max. 3800 MHz). I think I should start getting
nervous. :(

How do you know this happened twice if you couldn't see any message?

I was remotely logged in and there aren't that many faults which lead to complete stand still of hw (no reset).

But as you said I can't know, the only thing I know is that a box with new mb, memory and apu come to a complete stand still, and such shortly after I've received an emergency message which told me that a bit inside the cpu switched unexpected. Adding to that, the box did the same as what it did while it received the MCE, a backup from a sata-atached ssd to an usb3-hd which includes compression and encryption which keeps all cores at work most of the time for several hours.

Also, can you enable netconsole or serial console, if possible, and try
to catch full dmesg from the boot and up until it happens.

As I was logged in remotely by network, I know it wasn't the same MCE as before (just a disconnect and dead hw). But I don't know what else it was. And as I recently got hit by a broken RAM module, which was a pain to find, I'm not very happy that I have to go through similiar pain again with new HW.

The probability to get a working HW and SW combination just has become to low in the last years. All the (IT) companies better should spend the money they now give their lawyers their QA and engineering departments instead.

Sorry for the rant, also I'm used to live with hw and sw errors (as a sw-dev), I'm currently just a bit annoyed. ;)

I will setup something to monitor the box through the serial and will let it backup itself all the time, trying to catch some usefull information.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/