Re: Request for MCE decode (AMD Barcelona, fam 10h)

From: Jeroen van Rijn
Date: Mon Sep 08 2008 - 07:13:56 EST


On Mon, Sep 8, 2008 at 12:55 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> Tony Vroon <tony@xxxxxxxxx> writes:
>
>> HARDWARE ERROR. This is *NOT* a software problem!
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> Please contact your hardware vendor
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>> I realize that the linux kernel may be entirely blameless in this
>> situation,
>
> It is, like mcelog told you.
>
>> but I'd like to have some peer insight before I run after
>> vendors.
>
> It unfortunately turns out that mcelog logging is a tricky
> psychological problem. How should the warning above have
> looked like so that you would not have required "peer insight"
> and actually just contacted your hardware vendor?

I suppose mcelog might be extended to point at possible tools to get a
second opinion, in case the admin would like to to be entirely
certain. In their position I can understand them when their vendor
asks them if it's the hardware and what tests they've run to rule out
software.

Think for example a machine check that might point to faulty RAM, it
might direct the admin to run memcheck if mcelog alone isn't
compelling enough.

> Thank you.
>
> -Andi (who wonders if <blink> tags in syslog would be useful
> to solve this)

Yikes, ixnay to the <blinkay>. Next people will ask for flash support
to get all-singing and -dancing error messages.

-- Jeroen.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/