Re: AMD A10: MCE Instruction Cache Error

From: Alexander Holler
Date: Sat Nov 03 2012 - 06:46:23 EST


Am 03.11.2012 05:49, schrieb Borislav Petkov:
On Fri, Nov 02, 2012 at 02:53:45PM +0100, Alexander Holler wrote:
Am 02.11.2012 11:50, schrieb Alexander Holler:
Hello,

I've just got the following on an AMD A10 5800K:

------
[ 8395.999581] [Hardware Error]: CPU:0
MC1_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00002000010151
[ 8395.999586] [Hardware Error]: MC1_ADDR: 0x0000ffffa00e1203
[ 8395.999588] [Hardware Error]: Instruction Cache Error: Parity error
during data load from IC.
[ 8395.999590] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
------

Kernel is 3.6.5, MB is an Asus F2A85-M with BIOS 5103 (the latest).

...
So now I have two question:

- First, if the error is something I should ask AMD about,

Not really, it is a single bit flip which got corrected, simply watch
out if you get more of those.

- Second, if the kernel could mention that it is an recoverable
error. And if so and if such errors aren't something to get panic
(e.g. it isn't unusual to receive such), if the kernel could output
that message with another priority.

As I said above, it got corrected. If it were critical, it would've
either panicked or you wouldnt've seen it at all (probably only after
reboot).

Hmm, exactly that just happened twice in a row. Unfortunately the screen was already disabled (screen saving mode), so I couldn't see any message, if there was any. Just a dead box (not overclocked, I don't do such, I even had enabled the power saving mode in the BIOS, which seems to mean max. 3800 MHz). I think I should start getting nervous. :(

What I meant with another priority is using something else than pr_emerg(), because pr_emerge() causes the message to become displayed on every console, at least on my F17 with default settings.

Of course, I'm happy it was displayed using pr_emerg() so I haven't missed it. Now I know that even if ECC isn't available for users which don't want or need power hungry and loud servers, at least some parity is used to verify the operations with the internal memory (cache).

But on the other way, if that message isn't really critical, something else than pr_emerge() should be used.

Thanks for the answer.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/