Re: document ext3 requirements

From: Theodore Tso
Date: Sat Jan 03 2009 - 21:32:32 EST


On Sat, Jan 03, 2009 at 01:38:15PM +0100, Pavel Machek wrote:
> +Requirements
> +============
> +
> +Ext3 expects disk/storage subsystem to behave sanely. On sanely
> +behaving disk subsystem, data that have been successfully synced will
> +stay on the disk. Sane means:
> +
> +* writes to media never fail. Even if disk returns error condition during
> + write, ext3 can't handle that correctly, because success on fsync was already
> + returned when data hit the journal.
> +
> + (Fortunately writes failing are very uncommon on disks, as they
> + have spare sectors they use when write fails.)

This is not unique to ext3; per the discussion two weeks ago, this is
largely because of the fsync() interface not possibly being able to
return errors caused by failures when creating or modifying parent
directories. Given this, it's a bit misleading to place this in the
Documentation/filesystems/ext3.txt. At the minimum it should include
a discussion about what the issues might be, and given that pretty
much any Unix/Linux filesystem doesn't have a way of reflecting these
errors to application programs, it probably should be in a
filesystem-independent documentation file.

> +* either whole sector is correctly written or nothing is written during
> + powerfail.
> +
> + (Unfortuantely, none of the cheap USB/SD flash cards I seen do behave
> + like this, and are unsuitable for ext3. Because RAM tends to fail
> + faster than rest of system during powerfail, special hw killing
> + DMA transfers may be neccessary. Not sure how common that problem
> + is on generic PC machines).

Again, this is true for other filesystems (it was first discovered on
SGI "pizza boxes" machines running XFS, and special hardware changes
added to allow DMA aborts) --- in fact, because of ext3's use of
physical block journaling, it's much more likely that it will recover
from these sorts of errors. So it's very misleading to have this sort
of discussion in Documentation/filesystems/ext3.txt.

> +* either write caching is disabled, or hw can do barriers and they are enabled.
> +
> + (Note that barriers are disabled by default, use "barrier=1"
> + mount option after making sure hw can support them).

We really should get akpm to agree to accept the patch to default
barriers by default instead. :-)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/