Re: Hardware Error Kernel Mini-Summit

From: Andi Kleen
Date: Tue Jun 15 2010 - 02:45:13 EST


> There was a case mentioned at the collaboration summit
> meeting where a BIOS bug mis-reported whether ECC was
> enabled - claiming it was on, when in fact it was off.

Yes I heard about that, but since it's not a single bit setting
there are lots of different ways it could be broken in theory.

To check it you really need to have a tool that knows about
all the registers and checks them all.

It's a bit like checking if someone speaks a foreign language
by asking them a single question with a one letter answer.

> of the chipset specific code against each other. An EDAC
> driver that tells you that ECC is enabled might be lying too,
> if it is looking at the wrong bit or the wrong register.

Yep.

It's asking a question with a one word answer where you don't
know the correct answer.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/