Re: AMD A10: MCE Instruction Cache Error

From: Alexander Holler
Date: Fri Nov 02 2012 - 09:53:46 EST


Am 02.11.2012 11:50, schrieb Alexander Holler:
Hello,

I've just got the following on an AMD A10 5800K:

------
[ 8395.999581] [Hardware Error]: CPU:0
MC1_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00002000010151
[ 8395.999586] [Hardware Error]: MC1_ADDR: 0x0000ffffa00e1203
[ 8395.999588] [Hardware Error]: Instruction Cache Error: Parity error
during data load from IC.
[ 8395.999590] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
------

Kernel is 3.6.5, MB is an Asus F2A85-M with BIOS 5103 (the latest).

Can someone enlight me about what might be wrong with my (new) system
(memtest didn't show an errors)?

What IC is meant? As far as I know, this processor doesn't support ECC,
so I wonder where that parity error does come from.

I assume IC means Instruction Cache. ;)

As the kernel didn't reboot or halt, this seems to have been a correctable error.

Which leads me to another question. I have mcelog running, but it doesn't seem to receive the error. With my previous Intel-HW and an older kernel, mcelog received MCE errors (trip temperatur). But since the kernel now decodes those message themself, that doesn't seem to happen anymore. mcelog is silent, but now I've seen the above message on all my consoles.

So now I have two question:

- First, if the error is something I should ask AMD about,

- Second, if the kernel could mention that it is an recoverable error. And if so and if such errors aren't something to get panic (e.g. it isn't unusual to receive such), if the kernel could output that message with another priority.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/