Re: RAID-5 design bug (or misfeature)

From: Bill Davidsen
Date: Wed Jun 01 2005 - 13:18:17 EST


Alan Cox wrote:
On Llu, 2005-05-30 at 03:47, Mikulas Patocka wrote:

In article <Pine.LNX.4.58.0505300043540.5305@xxxxxxxxxxxxxxxxxxxxxxxx> you wrote:

I think Linux should stop accessing all disks in RAID-5 array if two disks
fail and not write "this array is dead" in superblocks on remaining disks,
efficiently destroying the whole array.


It discovered the disks had failed because they had outstanding I/O that
failed to complete and errorred. At that point your stripes *are*
inconsistent. If it didn't mark them as failed then you wouldn't know it
was corrupted after a power restore. You can then clean it fsck it,
restore it, use mdadm as appropriate to restore the volume and check it.


But root disk might fail too... This way, the system can't be taken down
by any single disk crash.


It only takes on disk in an array to short 12v and 5v due to a component
failure to total the entire disk array, and with both IDE and SCSI a
drive fail can hang the entire bus anyway.

Having somthing called "the entire bus" is more common on SCSI than IDE (at least well-configured IDE) unless you mean the PCI bus. I regularly used to see failures of one drive which made the SCSI controller decide that one other drive was bad. Fortunately some change in either the drive or controller (IBM ServeRAID) has made that a non-problem.

--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/