Re: application syncing options (was Re: [PATCH] Memory managementlivelock)

From: david
Date: Mon Oct 06 2008 - 23:37:41 EST


On Sun, 5 Oct 2008, Mikulas Patocka wrote:

On Sun, 5 Oct 2008, david@xxxxxxx wrote:

On Sun, 5 Oct 2008, Mikulas Patocka wrote:

On Fri, 3 Oct 2008, david@xxxxxxx wrote:

I've also seen discussions of how the
kernel filesystem code can do ordered writes without having to wait for
them
with the use of barriers, is this capability exported to userspace? if so,
could you point me at documentation for it?

It isn't. And it is good that it isn't --- the more complicated API, the
more maintenance work.

I can understand that most software would not want to deal with complications
like this, but for things thta have requirements similar to journaling
filesystems (databases for example) it would seem that there would be
advantages to exposing this capabilities.

David Lang

If you invent new interface that allows submitting several ordered IOs
from userspace, it will require excessive maintenance overhead over long
period of time. So it should be only justified, if the performance
improvement is excessive as well.

It should not be like "here you improve 10% performance on some synthetic
benchmark in one application that was rewritten to support the new
interface" and then create a few more security vulnerabilities (because of
the complexity of the interface) and damage overall Linux progress,
because everyone is catching bugs in the new interface and checking it for
correctness.

the same benchmarks that show that it's far better for the in-kernel filesystem code to use write barriers should apply for FUSE filesystems.

this isn't a matter of a few % in performance, if an application is sync-limited in a way that can be converted to write-ordered the potential is for the application to speed up my many times.

programs that maintain indexes or caches of data that lives in other files will be able to write data && barrier && write index && fsync and double their performance vs write data && fsync && write index && fsync

databases can potentially do even better, today they need to fsync data to disk before they can update their journal to indicate that the data has been written, with a barrier they could order the writes so that the write to the journal doesn't happen until the writes of the data. they would neve need to call an fsync at all (when emptying the journal)

for systems without solid-state drives or battery-backed caches, the ability to eliminate fsyncs by being able to rely on the order of the writes is a huge benifit.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/