Re: PROBLEM: Fatal Machine Check >= 3.13.5-101.fc19.x86_64

From: Tony Luck
Date: Fri Mar 21 2014 - 16:37:25 EST


On Fri, Mar 21, 2014 at 1:13 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> Provided the decode is correct and I'm reading it right, this looks
> like the cores get to livelock for some reason without any forward
> progress. The MCEs signal that there hasn't been any instruction retired
> in relatively long time, thus a stall.

Agreed. There are some bus level errors (low 16 bits of STATUS 0x0800)
and some timeout (low bits 0x0400)

> You say, this happens when gnome starts. Does it also happen if you
> don't start gnome, i.e. don't start X at all? Try booting into a
> runlevel without graphics.
>
> Tony, any other ideas?

My best guess is graphics? driver making wild access to some i/o regs that
never respond. If booting without graphics works, then that adds some
weight to the theory.

Other useful tests would be to check upstream kernels 3.12, 3.13 to
see if something is odd in the Fedora additions. And 3.14-rc7 to see
if it is already fixed upstream.

If upstream 3.12 works and 3.13 breaks (and not fixed in 3.14-rc7) ...
then bisecting between 3.12 and 3.13 would be helpful.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/