Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:document conditions when reliable operation is possible)

From: Ric Wheeler
Date: Mon Aug 31 2009 - 09:14:40 EST


On 08/30/2009 12:35 PM, Christoph Hellwig wrote:
On Sun, Aug 30, 2009 at 06:44:04PM +0400, Michael Tokarev wrote:
If you lose power with the write caches enabled on that same 5 drive
RAID set, you could lose as much as 5 * 32MB of freshly written data on
a power loss (16-32MB write caches are common on s-ata disks these
days).

This is fundamentally wrong. Many filesystems today use either barriers
or flushes (if barriers are not supported), and the times when disk drives
were lying to the OS that the cache got flushed are long gone.

While most common filesystem do have barrier support it is:

- not actually enabled for the two most common filesystems
- the support for write barriers an cache flushing tends to be buggy
all over our software stack,


Or just missing - I think that MD5/6 simply drop the requests at present.

I wonder if it would be worth having MD probe for write cache enabled & warn if barriers are not supported?

For MD5 (and MD6), you really must run with the write cache disabled
until we get barriers to work for those configurations.

I highly doubt barriers will ever be supported on anything but simple
raid1, because it's impossible to guarantee ordering across multiple
drives. Well, it *is* possible to have write barriers with journalled
(and/or with battery-backed-cache) raid[456].

Note that even if raid[456] does not support barriers, write cache
flushes still works.

All currently working barrier implementations on Linux are built upon
queue drains and cache flushes, plus sometimes setting the FUA bit.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/