Re: [PATCH 1/4] integrity: TPM internel kernel interface

From: Alan Cox
Date: Fri Aug 15 2008 - 17:40:19 EST


On Fri, 15 Aug 2008 14:50:01 -0400
Kenneth Goldman <kgoldman@xxxxxxxxxx> wrote:

> "Peter Dolding" <oiaohm@xxxxxxxxx> wrote on 08/15/2008 06:37:27 AM:
>
> > Remember even soldered on stuff can fail. How linux handles the
> > death of the TPM module needs to be covered.
>
> Is fault tolerance a requirement just for the TPM, or is it a general>
> Linux requirement? Has it always been there, or is it new?

We try very very hard to not crash on failure.

> For example, does kernel software have to gracefully handle
> failures in the disk controller, processor, memory controller, BIOS
> flash memory, etc?

Our disk layer will retry, reset, change cable speeds and if that fails
and you are running raid with multipaths or sufficient mirrors continue.
We capture processor exceptions and when possible log and continue
although most CPU failures report with the context corrupt. We log and
the EDAC layer handles as much as it possible can for memory errors
(actually we could be a bit more selective here and there are proposals
to go further)

> I'd think it would be quite hard to code around motherboard
> failures in a commodity platform not designed for fault tolerance.

The Linux userbase ranges from fault tolerant systems like Stratus to
dodgy cheapo boards from iffy cheap and cheerful computer merchants so it
makes sense to try and be robust.

In your TPM case being robust against the TPM ceasing to respond
certainly is worthwhile so that at least you return an error on failure
rather than the box dying. You may well not be able to get the chip back
in order without a hardware change/reboot.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/