Re: Linux 2.6.29

From: Linus Torvalds
Date: Fri Apr 03 2009 - 19:59:48 EST




On Fri, 3 Apr 2009, Jeff Garzik wrote:
>
> If all you want to do is _start_ the write-out from kernel to disk, and let
> the kernel handle it asynchronously, SYNC_FILE_RANGE_WRITE will do that for
> you, eliminating the need for a separate thread.

It may not eliminate the need for a separate thread.

SYNC_FILE_RANGE_WRITE will still block on things. It just will block on
_much_ less than fsync.

In particular, it will block on:

- actually queuing up the IO (ie we need to get the bio, request etc all
allocated and queued up)

- if a page is under writeback, and has been marked dirty since that
writeback started, we'll wait for that IO to finish in order to start a
new one.

and depending on load, both of these things _can_ be issues and you might
still want to do the SYNC_FILE_RANGE_WRITE as a async thread separate
from the main loop so that the latency of the main loop is not
affected by that.

But the latencies will be _much_ smaller issues than with f[data]sync(),
though, especially if you're not ever really hitting the limits on the
disk subsystem. Because those will additionally

- wait for all old writeback to complete (whether the page was dirtied
after the writeback started or not)

- additionally, wait for all the new writeback it started.

- wait for the metadata too (fsync()).

so they are pretty much _guaranteed_ to sleep for actual IO to complete
(unless you didn't write anything at all to the file ;)

> On a related subject, reads: consider posix_fadvise(POSIX_FADV_SEQUENTIAL)
> and/or readahead(2) for optimizing the reading side of things.

I doubt POSIX_FADV_SEQUENTIAL will do very much. The kernel tends to
figure out the read patterns on its own pretty well. Of course, explicit
readahead() can be noticeable for the right patterns.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/