Re: [PATCH v3 17/21] iomap: Atomic write support

From: John Garry
Date: Thu May 02 2024 - 05:13:30 EST


On 02/05/2024 02:43, Dave Chinner wrote:
On Wed, May 01, 2024 at 12:08:34PM +0100, John Garry wrote:
On 01/05/2024 02:47, Dave Chinner wrote:
On Mon, Apr 29, 2024 at 05:47:42PM +0000, John Garry wrote:
Support atomic writes by producing a single BIO with REQ_ATOMIC flag set.

We rely on the FS to guarantee extent alignment, such that an atomic write
should never straddle two or more extents. The FS should also check for
validity of an atomic write length/alignment.

Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
...
+
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
@@ -403,6 +407,12 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (is_atomic && n != orig_count) {
+ /* This bio should have covered the complete length */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto out;
+ }

What happens now if we've done zeroing IO before this? I suspect we
might expose stale data if the partial block zeroing converts the
unwritten extent in full...

We use iomap_dio.ref to ensure that __iomap_dio_rw() does not return until
any zeroing and actual sub-io block write completes. See iomap_dio_zero() ->
iomap_dio_submit_bio() -> atomic_inc(&dio->ref) callchain. I meant to add
such info to the commit message, as you questioned this previously.

Yes, I get that. But my point is that we may have only done -part-
of a block unaligned IO.

This is effectively a failure from a bio_iov_iter_get_pages() call.
What does the bio_iov_iter_get_pages() failure case do that this new
failure case not do? Why does this case have different failure
handling?


So you are saying that if we fail here (that is the (is_atomic && n != orig_count) check), anything unwritten in the atomic write region and zerotail region could expose stale data, right?

If so, I would say that we need to zero the complete unwritten atomic write and zerotail regions - similar to the bio_iov_iter_get_pages() failure handling - and still report an -EINVAL error.

How does that sound?

Thanks,
John