Re: Tux3 Report: Faster than tmpfs, what?

From: Daniel Phillips
Date: Sun May 12 2013 - 00:28:53 EST

(resent as plain text)

On Sat, May 11, 2013 at 2:26 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> Dropping fsync() does a lot more than "amplify Tux3's advantage in
> delete performace". Since fsync(2) is defined as not returning until
> the data written to the file descriptor is flushed out to stable
> storage --- so it is guaranteed to be seen after a system crash --- it
> means that the foreground application must not continue until the data
> is written by Tux3's back-end.
> So it also means that any advantage of decoupling the front/back end
> is nullified, since fsync(2) requires a temporal coupling. In fact,
> if there is any delays introdued between when the front-end sends the
> fsync request, and when the back-end finishes writing the data and
> then communicates this back to the front-end --- i.e., caused by
> schedular latencies, this may end up being a disadvantage compared to
> more traditional file system designs.
> Like many things in file system design, there are tradeoffs. It's
> perhaps more quseful when having these discussions to be clear what
> you are trading off for what; in this case, the front/back design may
> be good for somethings, and less good for others, such as mail server
> workloads where fsync(2) semantics is extremely important for
> application correctness.

Exactly, Ted. We avoided measuring the fsync load on this particular
benchmark because we have not yet optimized fsync. When we do get to
it (not an immediate priority) I expect Tux3 to perform competitively,
because our delta commit scheme does manage the job with a minimal
number of block writes. To have a really efficient fsync we need to
isolate just the changes for the fsynced file into a special "half
delta" that gets its own commit, ahead of any other pending changes
to the filesystem. There is a plan for this, however we would rather
not get sidetracked on that project now while we are getting ready
for merge.

The point that seems to be getting a little lost in this thread is,
the benchmark just as we ran it models an important and common type
of workload, arguably the most common workload for real users, and
the resulting performance measurement is easily reproducible for
anyone who cares to try. In fact, I think we should prepare and
post a detailed recipe for doing just that, since the interest
level seems to be high.



PS for any Googlers reading: do you know that using Gmail to post to
LKML is simply maddening for all concerned? If you want to know why
then try it yourself. Plain text. Some people need it, and need it to
be reliable instead of gratuitously changing back to html at
surprising times. And static word wrap. Necessary.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at