Re: Hardware designt to prevent Damages... [WAS: [patch 23/37]i2c-piix4: Blacklist two mainboards]

From: Jean Delvare
Date: Thu May 15 2008 - 14:51:25 EST


On Wed, 14 May 2008 21:52:53 +0200, Michelle Konzack wrote:
> Am 2008-05-13 13:12:25, schrieb Greg KH:
> > We had a report that running sensors-detect on a Sapphire AM2RD790
> > motherbord killed the CPU. While the exact cause is still unknown,
> > I'd rather play it safe and prevent any access to the SMBus on that
> > machine by not letting the i2c-piix4 driver attach to the SMBus host
> > device on that machine. Also blacklist a similar board made by DFI.
> ------------------------ END OF REPLIED MESSAGE ------------------------
>
> Hell, since I do not depend on LOW-BUDGET, I have never killed a CPU
> even by SICK programming even my SuperSparc CPU survived.
>
> Please can anyone advice me, about VERY VERY good Hardware design to
> prevent its destruction by software Errors?
>
> You can even send me examples of such broken Hardware...

In this particular case, the CPU was apparently damaged as the result
of accidental memory over-voltage. It is worth noting though that said
CPU had gone through intensive overclocking session beforehand, and
this might explain the death. Other CPUs are known to have gone through
the same experience and are still working.

So, to clear up any misunderstanding: the CPU damage did not occur
because we used some odd CPU instruction sequence or anything like
that. The damage came to the CPU from other hardware on the board.

To be a bit more technical, the design mistake (I think) that was made
by the designers of the motherboard in question, was to use an
I2C/SMBus chip on a PC motherboard, which uses SMBus receive byte and
SMBus send byte for control, and which lives at an I2C address which is
very common amongst hardware monitoring chip. The 4th factor being, of
course, that improperly programming the chip in question can result in
hardware damage. If only 3 of these 4 factors had been present, most
probably there would have been no issue in practice. But with all 4
factors, bad things just had to happen. And it's not just Linux, users
had similar problems running hardware monitoring tools under Windows
too.

--
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/