Re: Soft-Updates for Linux ?

From: Robert Redelmeier (redelm@ev1.net)
Date: Sun Oct 01 2000 - 23:15:32 EST

Next message: Mike A. Harris: "Re: Hard Disk slowdown w/ new mobo?"
Previous message: Linus Torvalds: "2.4.0-test9-pre8"
Next in thread: Albert D. Cahalan: "Re: Soft-Updates for Linux ?"
Reply: Albert D. Cahalan: "Re: Soft-Updates for Linux ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Daniel Phillips wrote in part:
>
> To be fair, when Soft Updates is working perfectly you will move from a
> situation where you are constantly at risk of catstrophic filesystem
> damage to one where you will just be losing track of some free blocks,
> have some file lengths wrong, and some partly completed writes. That's
> a huge improvement and I'd strongly recommend that anybody who didn't
> have anything better should use it.

This is my approach. AFAIK, SU works fine at protecting the filesystem
unless there is a powerfail in the middle of writing out metadata. This
is a _huge_ reduction in the window-of-vulnerability from the 15 seconds
of a sync to the 1 ms or so to write an inode. But SU is succeptible
to that inode being trashed by a pf in the middle of the write.

I like SU because it is a proven 99% solution with minimal complexity
versus Journalling with a full load of uncertainties. IMHO, only if
versioning were available under JFS would it be worthwhile.

> One thing to keep in mind in all of this is: nobody is testing the
> reliability of their journalling or any other kind of filesystem just by
> running it. To test these things you have to crash/interrupt the system
> *a lot*. Otherwise, you are just fooling yourself and everybody else.
> How many crashes does it take to find that one little window of
> vulnerability that comes up every 10,000 crashes normally but suddenly
> starts coming up every time just because your customer uses their system
> a different way? You're doing the right thing by crash-testing it, now
> instead of doing it 5 times do it 1,000 times. Here's one of my
> favorite tests: unzip a kernel source tree and wait until the disk light
> goes out. A second or so after it comes on again (kflushd) hit the
> reset button.

Good idea. I certainly believe in stressing hardware (see .sig),
but I'm not sure this test is rigorous enough. The problem is
the reset button is only connected to the CPU and the hard disk
will probably continue to write out sectors from it's hw buffer.
OTOH, I don't like the idea of pulling the plug too often. It's
very hard on the hardware. I'd expect a mechanical disk failure
before 10,000 cycles.

> I've never had it that bad but I've been dropped to single user shell
> plenty of times and lost a few files. However I don't think it's fair
> to compare Soft Updates to raw Ext2. One of them is trying to be a
> failsafe filesystem and the other is just hoping to hell the power
> doesn't fail.

Agreed! I'm frankly appalled that Linux is has this vulnerability
at this stage in it's development. Many places in W.Europe have
pretty reliable power, but here in N.America (Houston), single-feed
powerfails are a monthly occurance. Industries who cannot tolerate
this get a second feeder. Then frequency drops to annual.

-- Robert author `cpuburn` http://users.ev1.net/~redelm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mike A. Harris: "Re: Hard Disk slowdown w/ new mobo?"
Previous message: Linus Torvalds: "2.4.0-test9-pre8"
Next in thread: Albert D. Cahalan: "Re: Soft-Updates for Linux ?"
Reply: Albert D. Cahalan: "Re: Soft-Updates for Linux ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:09 EST