Re: Ext4 and the "30 second window of death"

From: Nick Piggin
Date: Thu Apr 02 2009 - 14:35:29 EST


On Friday 03 April 2009 05:22:48 david@xxxxxxx wrote:
> On Wed, 1 Apr 2009, Matthew Garrett wrote:
>
> >> The other subtlety comes if we add fsync() suppression to laptop mode
> >> --- which is something that Bart Samwel is very interested in doing
> >> and I talked to him at FOSDEM about this. As Jeff Garzik recently
> >> pointed out, however, if we let the system reorder writes across
> >> fsync() boundaries, or if we combine two writes to the same block
> >> separated by an fsync(), and the system crashes in the middle of
> >> pushing all of these blocks out to the disk, we can end up trashing
> >> the consistency guarantees of a database such as mysql or postgres.
> >> It's a good point, but it only applies if we add fsync() suppression
> >> to laptop mode --- which we haven't done yet.
> >
> > I've got absolutely no idea why anyone would want fsync() to stop
> > meaning "Put my data on the disk please". laptop-mode isn't intended to
> > reduce data integrity - it's intended to batch disk write-outs such that
> > there's a lower risk of needing to perform further write-outs in future.
> > It makes sense for applications which really desperately want
> > information on disk to fsync() (for instance, saving a file in
> > OpenOffice).
> >
> > laptop-mode is something that makes sense as a default behaviour under a
> > lot of circumstances. Adding fsync() suppression means it's utterly
> > impossible to use it in that way. An additional mode would be perfectly
> > reasonable, as long as it's made clear that it's really a request for
> > data to be discarded at some point. The current mode isn't.
>
> this issue seems pretty straightforward to me
>
> the apps do fsync (and similar) to the degree that they think their data
> is important (potentially with config options if they acknowlege that
> their data isn't _always_ that important)
>
> the system allows the admin to override the application and say "I'm
> willing to loose up to X seconds of data for other benifits"
>
> if this can work cleanly (with the ordering issue that was identified,
> which may involve having multiple versions of the metadata cached) it
> seems like a very clean interface.

It isn't just about ordering of writes a a filesystem. A database program
commits a transaction and then tells the client that it is safe. Client
then goes and does <something> in response to that, which may or may not
involve more writes to the filesystem.

Shouldn't applications have a mode to avoid spinning up the disk if it is
so important?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/