Re: [PATCH v4 16/31] loop: use aio to perform io on the underlyingfile

From: Dave Chinner
Date: Mon Dec 03 2012 - 21:52:13 EST


On Mon, Dec 03, 2012 at 10:59:39AM -0600, Dave Kleikamp wrote:
> On 11/22/2012 05:06 PM, Dave Chinner wrote:
> > On Wed, Nov 21, 2012 at 04:40:56PM -0600, Dave Kleikamp wrote:
> >> From: Zach Brown <zab@xxxxxxxxx>
> >>
> >> This uses the new kernel aio interface to process loopback IO by
> >> submitting concurrent direct aio. Previously loop's IO was serialized
> >> by synchronous processing in a thread.
> >>
> >> The aio operations specify the memory for the IO with the bio_vec arrays
> >> directly instead of mappings of the pages.
> >>
> >> The use of aio operations is enabled when the backing file supports the
> >> read_iter and write_iter methods. These methods must only be added when
> >> O_DIRECT on bio_vecs is supported.
> >>
> >> Signed-off-by: Dave Kleikamp <dave.kleikamp@xxxxxxxxxx>
> >> Cc: Zach Brown <zab@xxxxxxxxx>
> >
> > I suspect aio iocompetion here doesn't work for FUA write IO.
>
> Since the underlying file system is doing direct IO, we at least know
> that the IO has been performed to the underlying device. It could still
> be in the devices write cache, so maybe an fsync is still needed. It
> wouldn't be hard to fix if that's the right thing to do.

FUA means the write is all the way to stable storage when completion
is signalled by the underlying device. That means the loop device
needs to (at least) fdatasync after the write completes...

> >> +static void lo_rw_aio_complete(u64 data, long res)
> >> +{
> >> + struct bio *bio = (struct bio *)(uintptr_t)data;
> >> +
> >> + if (res > 0)
> >> + res = 0;
> >> + else if (res < 0)
> >> + res = -EIO;
> >> +
> >> + bio_endio(bio, res);
> >> +}
> >
> > This effectively does nothing...
>
> It passes the IO completion from the underlying device to the caller.

Yes, I know. What I'm saying is that it does nothing on completion
of a FUA write when it should be doing a fsync.....

> >> - lo->lo_encrypt_key_size) {
> >> - ret = -EOPNOTSUPP;
> >> - goto out;
> >> - }
> >> - ret = file->f_op->fallocate(file, mode, pos,
> >> - bio->bi_size);
> >> - if (unlikely(ret && ret != -EINVAL &&
> >> - ret != -EOPNOTSUPP))
> >> - ret = -EIO;
> >> - goto out;
> >> - }
> >> -
> >> ret = lo_send(lo, bio, pos);
> >>
> >> if ((bio->bi_rw & REQ_FUA) && !ret) {
> >
> > And as you can see here that after writing the data in the filebacked
> > path, there's special handling for REQ_FUA (i.e. another fsync).
> > ....
>
> In this path, the data may still be in the underlying filesystem's page
> cache.

Sure, but fsync means that it ends up on stable storage, as per the
requirement of an FUA write. Direct IO does not mean the data is on
stable storage, just that it bypasses the page cache.

> > And this extra fsync is now not done in the aio path. I.e. the AIO
> > completion path needs to issue the fsync to maintain correct REQ_FUA
> > semantics...
>
> If this is really necessary, I can fix it.

Absolutely. If we don't implement FUA properly, we'll end up with
corrupted filesystems and/or data loss when kernel crashes or
powerloss occurs. That's not an acceptable outcome, so we need FUA
to be implemented properly.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/