Re: fsync on large files

Theodore Y. Ts'o (tytso@MIT.EDU)
Sun, 14 Feb 1999 18:37:05 -0500 (EST)


From: Andi Kleen <ak@muc.de>
Date: 14 Feb 1999 09:29:06 +0100

This is a very important patch, because fsync() is needed in some
transaction oriented databases (they have to call fsync() or fdatasync()
on the log frequently to commit operations). For some of them 4 blocks
could be not enough though, perhaps a more generic version of your
patch that handles more than 4 blocks could be found - like keeping
the dirty blocks per inode on a special list.

If you look at my patch, you'll see that number of blocks saved is
configurable by adjusting the NUM_EXT2_FFSYNC_BLKS #define. While it
might make sense to increase that number, note that past a certain
point, you might as well simply go through all of the indirect blocks.

Adding blocks to the ffsync list is an order N-squared operation, due to
the need to check to see if the block is already on the list.
Originally, I didn't have this check, but my testing showed that without
this check, the list very quickly became overwhelmed with duplicate
entries. So, it was pretty much useless without the duplicate entry
check.

We could increase NUM_EXT2_FFSYNC_BLKS, but it would be useful to get
some actual performance results about how many blocks a typical
transaction oriented database actually tends to write out before calling
fsync() or fdatasync(). What would be most useful would be a chart
showing number of blocks written since the last fsync() versus
probability, so we could saw authoratively that databases dirtied (for
example) 8 blocks before calling fsync() 80% of the time.

Anyone interesting in doing some data gathering?

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/