[patch] document flash/RAID dangers

From: Pavel Machek
Date: Tue Aug 25 2009 - 18:21:33 EST


Hi!

> It seems that you are really hung up on whether or not the filesystem
> metadata is consistent after a power failure, when I'd argue that the
> problem with using storage devices that don't have good powerfail
> properties have much bigger problems (such as the potential for silent
> data corruption, or even if fsck will fix a trashed inode table with
> ext2, massive data loss). So instead of your suggested patch, it
> might be better simply to have a file in Documentation/filesystems
> that states something along the lines of:
>
> "There are storage devices that high highly undesirable properties
> when they are disconnected or suffer power failures while writes are
> in progress; such devices include flash devices and software RAID 5/6
> arrays without journals, as well as hardware RAID 5/6 devices without
> battery backups. These devices have the property of potentially
> corrupting blocks being written at the time of the power failure, and
> worse yet, amplifying the region where blocks are corrupted such that
> adjacent sectors are also damaged during the power failure.

In FTL case, damaged sectors are not neccessarily adjacent. Otherwise
this looks okay and fair to me.

> Users who use such storage devices are well advised take
> countermeasures, such as the use of Uninterruptible Power Supplies,
> and making sure the flash device is not hot-unplugged while the device
> is being used. Regular backups when using these devices is also a
> Very Good Idea.
>
> Otherwise, file systems placed on these devices can suffer silent data
> and file system corruption. An forced use of fsck may detect metadata
> corruption resulting in file system corruption, but will not suffice
> to detect data corruption."

Ok, would you be against adding:

"Running non-journalled filesystem on these may be desirable, as
journalling can not provide meaningful protection, anyway."

> My big complaint is that you seem to think that ext3 some how let you
> down, but I'd argue that the real issue is that the storage device let
> you down. Any journaling filesystem will have the properties that you
> seem to be complaining about, so the fact that your patch only
> documents this as assumptions made by ext2 and ext3 is unfair; it also
> applies to xfs, jfs, reiserfs, reiser4, etc. Further more, most
> users

Yes, it applies to all journalling filesystems; it is just that I was
clever/paranoid enough to avoid anything non-ext3.

ext3 docs still says:
# The journal supports the transactions start and stop, and in case of a
# crash, the journal can replay the transactions to quickly put the
# partition back into a consistent state.

> are even more concerned about possibility of massive data loss and/or
> silent data corruption. So if your complaint that we don't have
> documentation warning users about the potential pitfalls of using
> storage devices with undesirable power fail properties, let's document
> that as a shortcoming in those storage devices.

Ok, works for me.

---

From: Theodore Tso <tytso@xxxxxxx>

Document that many devices are too broken for filesystems to protect
data in case of powerfail.

Signed-of-by: Pavel Machek <pavel@xxxxxx>

diff --git a/Documentation/filesystems/dangers.txt b/Documentation/filesystems/dangers.txt
new file mode 100644
index 0000000..e1a46dd
--- /dev/null
+++ b/Documentation/filesystems/dangers.txt
@@ -0,0 +1,19 @@
+There are storage devices that high highly undesirable properties
+when they are disconnected or suffer power failures while writes are
+in progress; such devices include flash devices and software RAID 5/6
+arrays without journals, as well as hardware RAID 5/6 devices without
+battery backups. These devices have the property of potentially
+corrupting blocks being written at the time of the power failure, and
+worse yet, amplifying the region where blocks are corrupted such that
+additional sectors are also damaged during the power failure.
+
+Users who use such storage devices are well advised take
+countermeasures, such as the use of Uninterruptible Power Supplies,
+and making sure the flash device is not hot-unplugged while the device
+is being used. Regular backups when using these devices is also a
+Very Good Idea.
+
+Otherwise, file systems placed on these devices can suffer silent data
+and file system corruption. An forced use of fsck may detect metadata
+corruption resulting in file system corruption, but will not suffice
+to detect data corruption.
\ No newline at end of file

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/