Re: Ext4 and the "30 second window of death"

From: david
Date: Thu Apr 02 2009 - 23:09:10 EST


On Fri, 3 Apr 2009, Matthew Garrett wrote:

On Thu, Apr 02, 2009 at 06:24:28PM -0700, david@xxxxxxx wrote:
On Fri, 3 Apr 2009, Matthew Garrett wrote:
No it wouldn't. The kernel would be implementing an adminstrator's
choice about whether fsync() is important or not. That's something that
would affect the mail client, but it's hardly a decision based on the
mail client. Sucks to be that user if they do anything involving mysql.

in the case of laptops, in 99+% of the cases the user and the
administrator are the same person. in the other cases that's something the
user should take up with the administrator, because the administrator can
do a lot of things to the system that will affect the safety of their data
(including loading a kernel that turns fsync into a noop, but more likely
involving enabling or disabling write caches on disks)

Well, yes, the administrator could hate the user. They could achieve the
same affect by just LD_PRELOADING something that stubbed out fsync() and
inserted random data into every other write(). We generally trust that
admins won't do that.

then trust the admins to make a reasonable decision for or with the user on this as well.

Benchmarks please.

if spinning down a drive saves so little power that it wouldn't make a
significant difference to battery lift to leave it on, why does anyone
bother to spin the drive down?

There's various circumstances in which it's beneficial. The difference
between an optimal algorithm for typical use and an optimal algorithm
for typical use where there's an fsync() every 5 minutes isn't actually
that great.

mixing some sub-threads a bit to combine thoughts

you object to calling something like this 'laptop mode'

Ted's statements about laptop mode indicate that he believes that it delays writes for a configurable time rather than accelerating writes.

what would you think of something like the following

at the block device level an option called something like "delay_writes"

delays writes (including fsync) up to the configurable number of seconds.

if an fsync or barrier is issued the block driver figures out what pages would be written by that fsync/barrier, puts them in it's queue (but doesn't start the write), puts a barrier in it's queue following the pages and marks the pages COW.

if the timeout expires (or the drive spins up for other reasons) and the pages have not been modified, they get written and released by the block driver (which should take them out of COW mode).

if the pages get written to prior to the write taking place, COW kicks in and new pages are allocated for the changes. since the device driver already has those pages queued the filesystem just ends up with the copied pages and continues operation. when the drive finally gets spun up, the queued pages get written prior to anything else (preserving order in case of a crash)

doing this could cost memory (as there may be multiple copies of something queued), so it may be worth having some trigger that if more than X pages are queued by the block driver, it should go ahead and spin up the drive to write them.

thoughts?

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/