Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:

From: david
Date: Mon Aug 31 2009 - 11:46:30 EST


On Mon, 31 Aug 2009, Pavel Machek wrote:

Actually, there is something the file system can do to make journaling
safe on degraded RAIDs: make the (checksummed) journal blocks equal to
the RAID stripe size. Or, equivalently, pad out to the RAID stripe
size each commit.

This sometimes leads to awkward block sizes, but while writing
to any *one* stripe on a degraded RAID-5 endangers the others, you
can write to *all* of them with the usual semantics.

Well, that would work... but you'd also have to journal data, with the
same block size. Not exactly fast, but at least safe...

That's one thing I really like about ZFS: its policy of "don't trust
the disks." If nothing else, simply telling you "your disks f*ed up,
and I caught them doing it", instead of the usual mysterious corruption
detectec three months later, is tremendoudly useful information.

The more I learn about storage, the more I like idea of zfs. Given the
subtle issues between filesystem and raid layer, integrating them just
makes sense.

note that all that zfs does is tell you that you already lost data (and then only if the checksumming algorithm would be invalid on a blank block being returned), it doesn't protect your data.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/