Re: fsck is dead (was: Some very thought-provoking ...)

Steve Underwood (steveu@netpage.com.hk)
Tue, 29 Jun 1999 05:10:07 +0000


Chris Adams wrote:

> Once upon a time, Michael B. Trausch <mtrausch@wcnet.org> said:
> >> I talked to someone a few weeks ago who described a system he just sold --
> >> 5 minutes of down time costs this customer about $500,000. That means that
> >> the 33 minute fsck costs about 3.3 million dollars. Do *you* want to tell
> >> this person that fsck is tolerable?
> >
> >Um, can you say, UPS, and "maintainence"? If proper amounts of both are
> >given, the chances of an fsck are almost NONE, in my experience and that
> >of *MY* clients. I don't know about anyone else...
>
> Crashes happen. RAM fails, CPUs fail, etc. I don't know about
> "maintainence", but maintenance doesn't fix hardware failures, and there
> is no way to work around a CPU failure under Linux. ECC RAM can help
> protect you from failures there, and you can do RAID for disks and have
> redundant power supplies hooked up to multiple UPSs, reducing your
> chances of failure, but you cannot reduce the chance to zero.
>
> "almost NONE" is not never and doesn't cut it. Like you said, you don't
> know about anyone else.

The subject wasn't hardware fault tolerance. It was a desire for faster reboots
on a vanilla server. The correct answer to maximising up time in such a situation
is indeed to take care to avoid need for reboots - good power, good air con,
clean filters, tidy wiring, etc. Most people have a poor attitiude to these
things, but they can do far more to improve most people's up-time than a faster
fsck ever could. I've had a lot of bad experiences with ECC RAM *not* protecting
against failures (many designs are buggy), but I've had excellent experience with
a cool dust free machine room avoiding them. How can I know that I am avoiding
failures? Well, I just have to compare our failure rates with those of other with
similar hardware and poorly managed installations.

Start with the weakest link!

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/