Re: [PATCH 07/13] aio: enabled thread based async fsync

From: Dave Chinner
Date: Mon Jan 11 2016 - 21:25:56 EST


On Mon, Jan 11, 2016 at 05:20:42PM -0800, Linus Torvalds wrote:
> On Mon, Jan 11, 2016 at 5:11 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > Insufficient. Needs the range to be passed through and call
> > vfs_fsync_range(), as I implemented here:
>
> And I think that's insufficient *also*.
>
> What you actually want is "sync_file_range()", with the full set of arguments.

That's a different interface. the aio fsync interface has been
exposed to userspace for years, we just haven't implemented it in
the kernel. That's a major difference to everything else being
proposed in this patch set, especially this one.

FYI sync_file_range() is definitely not a fsync/fdatasync
replacement as it does not guarantee data durability in any way.
i.e. you can call sync_file_range, have it wait for data to be
written, return to userspace, then lose power and lose the data that
sync_file_range said it wrote. That's because sync_file_range()
does not:

a) write the metadata needed to reference the data to disk;
and
b) flush volatile storage caches after data and metadata is
written.

Hence sync_file_range is useless to applications that need to
guarantee data durability. Not to mention that most AIO applications
use direct IO, and so have no use for fine grained control over page
cache writeback semantics. They only require a) and b) above, so
implementing the AIO fsync primitive is exactly what they want.

> Yes, really. Sometimes you want to start the writeback, sometimes you
> want to wait for it. Sometimes you want both.

Without durability guarantees such application level optimisations
are pretty much worthless.

> I think this only strengthens my "stop with the idiotic
> special-case-AIO magic already" argument. If we want something more
> generic than the usual aio, then we should go all in. Not "let's make
> more limited special cases".

No, I don't think this specific case does, because the AIO fsync
interface already exists....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx