Re: Ext4 and the "30 second window of death"

From: david
Date: Thu Apr 02 2009 - 17:47:39 EST


On Fri, 3 Apr 2009, Nick Piggin wrote:

On Friday 03 April 2009 05:22:48 david@xxxxxxx wrote:
On Wed, 1 Apr 2009, Matthew Garrett wrote:

The other subtlety comes if we add fsync() suppression to laptop mode
--- which is something that Bart Samwel is very interested in doing
and I talked to him at FOSDEM about this. As Jeff Garzik recently
pointed out, however, if we let the system reorder writes across
fsync() boundaries, or if we combine two writes to the same block
separated by an fsync(), and the system crashes in the middle of
pushing all of these blocks out to the disk, we can end up trashing
the consistency guarantees of a database such as mysql or postgres.
It's a good point, but it only applies if we add fsync() suppression
to laptop mode --- which we haven't done yet.

I've got absolutely no idea why anyone would want fsync() to stop
meaning "Put my data on the disk please". laptop-mode isn't intended to
reduce data integrity - it's intended to batch disk write-outs such that
there's a lower risk of needing to perform further write-outs in future.
It makes sense for applications which really desperately want
information on disk to fsync() (for instance, saving a file in
OpenOffice).

laptop-mode is something that makes sense as a default behaviour under a
lot of circumstances. Adding fsync() suppression means it's utterly
impossible to use it in that way. An additional mode would be perfectly
reasonable, as long as it's made clear that it's really a request for
data to be discarded at some point. The current mode isn't.

this issue seems pretty straightforward to me

the apps do fsync (and similar) to the degree that they think their data
is important (potentially with config options if they acknowlege that
their data isn't _always_ that important)

the system allows the admin to override the application and say "I'm
willing to loose up to X seconds of data for other benifits"

if this can work cleanly (with the ordering issue that was identified,
which may involve having multiple versions of the metadata cached) it
seems like a very clean interface.

It isn't just about ordering of writes a a filesystem. A database program
commits a transaction and then tells the client that it is safe. Client
then goes and does <something> in response to that, which may or may not
involve more writes to the filesystem.

Shouldn't applications have a mode to avoid spinning up the disk if it is
so important?

why should every application have to have a "I'm mobile" config option?

what about a user that's only mobile sometimes and wants full protection the rest of the time? how can they easily switch every application between 'keep the data as safe as you can' and 'save battery' modes? will you have to restart all the apps when you unplug power to switch their modes?

allowing the user to tell the system to override the applications when the user wants to is _much_ easier.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/