Re: ext4 features (checksums)

From: Neil Brown
Date: Mon Jul 03 2006 - 19:30:12 EST


On Monday July 3, alan@xxxxxxxxxxxxxxxxxxx wrote:
> Ar Llu, 2006-07-03 am 23:01 +0200, ysgrifennodd Arjan van de Ven:
> > raid is great for protecting against individual disks or sectors going
> > bad. But raid, especially high performance implementations, do not
> > checksum data or detect corruptions.
> >
> > They're different purpose with almost zero overlap in purpose or even
> > goal...
>
> Same layer though - checksums are really a device mapper type problem
> rather than an fs type problem.

Can't say I agree with this layering distinction.
It's been some years that I've felt that most 'logical volume
management' really belongs in the filesystem.
Why have a dm that chops devices up in to segments and assembles them to
look like a big device, only to have that big device chopped up and
presented as files. Seems like double handling to me.

With checksums - the filesystem is in a better position to:
- be selective about what is checksummed - no point checksumming
blocks that aren't part of any file. Some blocks (highlevel
metadata) might always be checksummed, while other blocks
(regular data) might not if a 'fast' option was chosen.
- record the checksum somewhere easily accessible. The dm layer
could do little better than store a block of checksums for every 10
blocks of data. A filesystem can store checksums with indexing
information, or ensure that checksums for consecutive blocks in a
file are stored together, even if the blocks cannot be.

I think that for a filesystem that makes heavy use of trees to find
things, it makes a lot of sense to checksum and replicate the upper
levels of the tree, while checksumming and replicating lower levels
has a very different cost/benefit tradeoff. These distinctions are
easy to make in a filesystem, and hard to make in a block device.

To my mind, the only thing you should put between the filesystem and
the raw devices is RAID (real-raid - not raid0 or linear).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/