Re: High-availability question

Walter Lundby (lundbys@xnet.com)
Mon, 26 Jul 1999 23:23:26 -0500


Steve Underwood wrote:

> "Stephen C. Tweedie" wrote:
>
> > > The data corruption caused by a not correctly plugged in SCSI cable
> > > resulted in both mirror sets being corrupted and the corrupted data
> > > also being mirrored over to the other system.
> >
> > Sure --- garbage in, garbage out. You still aren't proof against an
> > application failure or against writing the wrong data. That doesn't
> > mean that having a redundant storage subsystem is useless: it just means
> > that such redundancy is only part of the problem.
> >
> > btw, Tandem would have caught that scsi error: the existence of such
> > fault tolerant systems shows that it _is_ possible to guard against
> > random machine failures. You are still in trouble if the application is
> > buggy, of course.
>
> You don't need a Tandem to catch this. You just need to turn on the SCSI
> parity. For some reason the majority of systems seem to have it turned off.
> On some older systems parity didn't work properly, but I don't think that is
> an issue today. Still, most systems I look at have parity turned off.
>
> Stratuses die. Tandems die. They die less often, but the consequences are
> worse. In my experience a Stratus will have more than one reboot a year, due
> to various troubles. A just reboot takes over an hour, so one reboot per year
> brings availability down to 99.99%. Tandems have a pretty slow restart too.
> The place were Tandems seem to score well is data integrity - banks don't
> loose many transactions through them, even when they do down.
>
> Steve
>
>

Don't brag about the Tandems too much. With the Helix series they disable
the CRC check in the drive and do it in the io-controller and then modify
non-stop
unix to handle it. This makes for very slow disk-io and very expensive drives.
We have had a very hard time getting Tandems to 5 9s which cost dearly in
payments to customers. Oh well...

A fault tolerant Corba TAU orb is one of the solutions considered for future
replacement of standard fault tolerant systems such as Tandem.

If anybody has a real good and speedy 5 9's solution I am all ears...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/