Re: Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM

From: Robin Lee Powell
Date: Mon Sep 18 2006 - 15:06:46 EST


On Mon, Sep 18, 2006 at 09:50:41AM +0200, Andi Kleen wrote:
> Robin Lee Powell <rlpowell@xxxxxxxxxxxxxxxxxx> writes:
> >
> > This version is rather different, as it ends in:
> >
> > HARDWARE ERROR
> > CPU 0: Machine Check Exception: 7 Bank 3: b40000000000083b
> > RIP 10:<ffffffff80446e3e> {pci_conf1_read+0xbe/0xf0}
> > TSC 2e7932dbf8 ADDR fdfc000cfc
> > This is not a software problem!
> > Run through mcelog --ascii to decode and contact your hardware vendor
> > Kernel panic - not syncing: Uncorrected machine check
>
> Decoded it gives
>
> ..
> bus error 'local node origin, request didn't time out
> data read mem transaction
> i/o access, level generic'
> ..
>
> It will probably boot with mce=off acpi=off pci=conf1

Indeed! Even on the ones that weren't having an MCE problem.

> You got some buggy device that causes a bus timeout when its
> config space is read. The old kernel most likely didn't touch it
> by luck.
>
> Please add the following patch and send the whole log. This will
> tell us which device has this problem.

OK. I'll post results in a bit.

-Robin

--
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/