Re: Syncing a file's metadata in a portable way

From: Alberto Bertogli
Date: Sun Jul 11 2004 - 09:18:11 EST


On Sun, Jul 11, 2004 at 03:35:27AM -0700, Andrew Morton wrote:
> bert hubert <ahu@xxxxxxx> wrote:
> >
> > On Sat, Jul 10, 2004 at 01:14:59PM -0700, Andrew Morton wrote:
> >
> > > If only the one file has been written to, an fsync on ext3 shouldn't
> > > produce any more writeout than an fsync on ext2.
> > (...)
> > > Either that, or SQLite is broken.
> >
> > I'll show strace and vmstat tomorrow - I found very little writes, no mmap,
> > some fsync and massive writeouts. On ext2, performance was two orders of
> > magnitude better.
> >
>
> One scenario which could cause this is if the application is writing a
> large amount of data to a file and is repeatedly *overwriting* that data.
> And the application is repeatedly adding new blocks to, and fsyncing a
> separate file.

I don't know about SQLite, but I've written a small transactional I/O
library and it seems to trigger this behaviour too.

I test with fsx opening files O_SYNC against fsx using the library with a
mode called "lingering transactions" that write the data synchronously
only once when the trasaction is commited (and fsync()s at the end, which
doesn't seem to make a significant difference).

In this mode the library creates a file for each transaction, write to it
using pwrite and then fsync both the file and the parent directory. Then
it uses pwrite to write to the real file, without syncing it.

I'm using an USB flash so disk seeks are not so costly. Here are the
results, running "fsx -R -W -p 1024 -N 1000 testfile" as root, on the
flash. For more operations (-N) the relation between the tests is pretty
much the same.

Tests are:
* sync: fsx opening everything O_SYNC (uses write())
* linger: fsx using the library with the method described avobe (uses
pwrite and fsync)

Time is measured with "time" (real), and the time spent in write and fsync
with ltrace -S -c (in seconds), taken in different runs so ltrace overhead
doesn't show up in time. The other functions and system calls don't make a
significant difference.

I tested ext2, ext3 with data=ordered and data=writeback, without any
mount options.


test fs total time write fsync ltrace total

sync ext2 0m22.956s 69.007234 --- 153.888504
linger ext2 0m27.358s --- 81.107975 191.014929

sync ext3-o 0m23.709s 69.143989 --- 162.130448
linger ext3-o 0m37.234s --- 109.51823 243.963197

sync ext3-w 0m22.622s 71.071572 --- 160.095286
linger ext3-w 0m26.429s --- 76.482683 188.377637


So ext3 in writeback mode has almost the same numbers as ext2, but using
ordered mode is much more slower in the library case.


Thanks,
Alberto


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/