Re: [patch] document flash/RAID dangers

From: Ric Wheeler
Date: Tue Aug 25 2009 - 20:29:26 EST


On 08/25/2009 08:20 PM, Pavel Machek wrote:
---
There are storage devices that high highly undesirable properties
when they are disconnected or suffer power failures while writes are
in progress; such devices include flash devices and MD RAID 4/5/6
arrays. These devices have the property of potentially
corrupting blocks being written at the time of the power failure, and
worse yet, amplifying the region where blocks are corrupted such that
additional sectors are also damaged during the power failure.

I would strike the entire mention of MD devices since it is your
assertion, not a proven fact. You will cause more data loss from common

That actually is a fact. That's how MD RAID 5 is designed. And btw
those are originaly Ted's words.

Ted did not design MD RAID5.

So what? He clearly knows how it works.

Instead of arguing he's wrong, will you simply label everything as
unproven?

events (single sector errors, complete drive failure) by steering people
away from more reliable storage configurations because of a really rare
edge case (power failure during split write to two raid members while
doing a RAID rebuild).

I'm not sure what's rare about power failures. Unlike single sector
errors, my machine actually has a button that produces exactly that
event. Running degraded raid5 arrays for extended periods may be
slightly unusual configuration, but I suspect people should just do
that for testing. (And from the discussion, people seem to think that
degraded raid5 is equivalent to raid0).

Power failures after a full drive failure with a split write during a rebuild?

Look, I don't need full drive failure for this to happen. I can just
remove one disk from array. I don't need power failure, I can just
press the power button. I don't even need to rebuild anything, I can
just write to degraded array.

Given that all events are under my control, statistics make little
sense here.
Pavel


You are deliberately causing a double failure - pressing the power button after pulling a drive is exactly that scenario.

Pull your single (non-MD5) disk out while writing (hot unplug from the S-ATA side, leaving power on) and run some tests to verify your assertions...

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/