Re: ext2/3: document conditions when reliable operation is possible

From: Pavel Machek
Date: Mon Aug 24 2009 - 05:26:40 EST


Hi!

> >> > This is not about barriers (that should be different topic). Atomic
> >> > write means that either whole sector is written, or nothing at all is
> >> > written. Because raid5 needs to update both master data and parity at
> >> > the same time, I don't think it can guarantee this during powerfail.
>
> Actualy raid5 should have no problem with a power failure during
> normal operations of the raid. The parity block should get marked out
> of sync, then the new data block should be written, then the new
> parity block and then the parity block should be flaged in sync.
>
> >> Good point, but I thought that's what journaling was for?
> >
> > I believe journaling operates on assumption that "either whole sector
> > is written, or nothing at all is written".
>
> The real problem comes in degraded mode. In that case the data block
> (if present) and parity block must be written at the same time
> atomically. If the system crashes after writing one but before writing
> the other then the data block on the missng drive changes its
> contents. And for example with a chunk size of 1MB and 16 disks that
> could be 15MB away from the block you actualy do change. And you can
> not recover that after a crash as you need both the original and
> changed contents of the block.
>
> So writing one sector has the risk of corrupting another (for the FS)
> totally unconnected sector. No amount of journaling will help
> there. The raid5 would need to do journaling or use battery backed
> cache.

Thanks, I updated my notes.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/