I'm not sure what's rare about power failures. Unlike single sector
errors, my machine actually has a button that produces exactly that
event. Running degraded raid5 arrays for extended periods may be
slightly unusual configuration, but I suspect people should just do
that for testing. (And from the discussion, people seem to think that
degraded raid5 is equivalent to raid0).
Power failures after a full drive failure with a split write during a rebuild?
Look, I don't need full drive failure for this to happen. I can just
remove one disk from array. I don't need power failure, I can just
press the power button. I don't even need to rebuild anything, I can
just write to degraded array.
Given that all events are under my control, statistics make little
sense here.
You are deliberately causing a double failure - pressing the power button
after pulling a drive is exactly that scenario.
Exactly. And now I'm trying to get that documented, so that people
don't do it and still expect their fs to be consistent.
Pull your single (non-MD5) disk out while writing (hot unplug from the
S-ATA side, leaving power on) and run some tests to verify your
I actually did that some time ago with pulling SATA disk (I actually
pulled both SATA *and* power -- that was the way hotplug envelope
worked; that's more harsh test than what you suggest, so that should
be ok). Write test was fsync heavy, with logging to separate drive,
checking that all the data where fsync succeeded are indeed
accessible. I uncovered few bugs in ext* that jack fixed, I uncovered
some libata weirdness that is not yet fixed AFAIK, but with all the
patches applied I could not break that single SATA disk.