Re: [RFC/SERIOUS] grilling troubled CPUs for fun and profit?

From: Dave Jones
Date: Mon Jun 19 2006 - 16:58:25 EST


On Mon, Jun 19, 2006 at 04:47:37PM -0400, linux-os (Dick Johnson) wrote:

> > Why do you think we go to the bother of installing a double fault handler if
> > we're going to reset? Why would we go to the bother of printk'ing
> > information about the double fault if we're about to reset faster than
> > it would get to a serial console ?
>
> I don't know why you go to the bother of installing such a handler.

I even explained why in my mail.

"The box intentionally locks up, so we have a chance to know wtf happened."

What's easier to debug: A box that just spontaneously reboots, or a box
that locks up with doublefault information on screen ?

> Have you actually gotten it to print something? All my experience
> with double-faults (and many with your RH Linux, BTW) result in
> the screen going blank, the POST starting, and the machine re-booting.

No release of Red Hat Linux ever shipped with a double fault hander.
RHEL4 is the only product we sell that has it. Fedora also has it since
FC2 iirc.

> > Your single datapoint is just that, a single datapoint.
> > There are a number of reported cases of CPUs frying themselves.
> > Here's one: http://www.tomshardware.com/2001/09/17/hot_spot/page4.html
> > Google no doubt has more.
> >
> > Another anecdote: Upon fan failure, I once had an athlon MP *completely shatter*
> > (as in broke in two pieces) under extreme heat.
> >
> > This _does_ happen.
>
> Maybe it wasn't a FAN failure?

Same fan was transplanted to another box after that death, where it
was found to be as dead as the CPU & board it was previously plugged into.

Given the heat of the dead CPU, it's safe to say it wasn't operation
at time of death.

Dave

--
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/