Re: [PATCH] Update Documentation/md.txt to mention journalingwon't help dirty+degraded case.

From: Pavel Machek
Date: Thu Sep 03 2009 - 08:33:16 EST

On Thu 2009-09-03 08:05:31, Ric Wheeler wrote:
> On 09/02/2009 06:49 PM, Rob Landley wrote:
>> From: Rob Landley<rob@xxxxxxxxxxx>
>> Add more warnings to the "Boot time assembly of degraded/dirty arrays" section,
>> explaining that using a journaling filesystem can't overcome this problem.
>> Signed-off-by: Rob Landley<rob@xxxxxxxxxxx>
>> ---
>> Documentation/md.txt | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>> diff --git a/Documentation/md.txt b/Documentation/md.txt
>> index 4edd39e..52b8450 100644
>> --- a/Documentation/md.txt
>> +++ b/Documentation/md.txt
>> @@ -75,6 +75,23 @@ So, to boot with a root filesystem of a dirty degraded raid[56], use
>> md-mod.start_dirty_degraded=1
>> +Note that Journaling filesystems do not effectively protect data in this
>> +case, because the update granularity of the RAID is larger than the journal
>> +was designed to expect. Reconstructing data via partity information involes
>> +matching together corresponding stripes, and updating only some of these
>> +stripes renders the corresponding data in all the unmatched stripes
>> +meaningless. Thus seemingly unrelated data in other parts of the filesystem
>> +(stored in the unmatched stripes) can become unreadable after a partial
>> +update, but the journal is only aware of the parts it modified, not the
>> +"collateral damage" elsewhere in the filesystem which was affected by those
>> +changes.
>> +
>> +Thus successful journal replay proves nothing in this context, and even a
>> +full fsck only shows whether or not the filesystem's metadata was affected.
>> +(A proper solution to this problem would involve adding journaling to the RAID
>> +itself, at least during degraded writes. In the meantime, try not to allow
>> +a system to shut down uncleanly with its RAID both dirty and degraded, it
>> +can handle one but not both.)
>> Superblock formats
>> ------------------
> Now you have moved the inaccurate documentation about journalling file
> systems into the MD documentation.

What is inaccurate about it?

> Repeat after me:

> (1) partial writes to a RAID stripe (with or without file systems, with
> or without journals) create an invalid stripe

That's what he's documenting.

> (2) partial writes can be prevented in most cases by running with write
> cache disabled or working barriers

Given how long experience with storage you claim, you should know that
MD RAID5 does not support barriers by now...

> Rob, you should really try to take a few disks, build a working MD RAID5
> group and test your ideas. Try it with and without the write cache
> enabled.

....and understand by now that statistics are irrelevant for design

Ouch and trying to silence people by telling them to fix the problem
instead of documenting it is not nice either.
(cesky, pictures)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at