Re: [PATCH v3 17/21] iomap: Atomic write support

From: John Garry
Date: Wed May 01 2024 - 07:09:45 EST


On 01/05/2024 02:47, Dave Chinner wrote:
On Mon, Apr 29, 2024 at 05:47:42PM +0000, John Garry wrote:
Support atomic writes by producing a single BIO with REQ_ATOMIC flag set.

We rely on the FS to guarantee extent alignment, such that an atomic write
should never straddle two or more extents. The FS should also check for
validity of an atomic write length/alignment.

Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
fs/iomap/direct-io.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index a3ed7cfa95bc..d7bdeb675068 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -275,6 +275,7 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
struct iomap_dio *dio)
{
+ bool is_atomic = dio->iocb->ki_flags & IOCB_ATOMIC;
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int zeroing_size, pad;
@@ -387,6 +388,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
bio->bi_write_hint = inode->i_write_hint;
bio->bi_ioprio = dio->iocb->ki_ioprio;
+ if (is_atomic)
+ bio->bi_opf |= REQ_ATOMIC;

REQ_ATOMIC is only valid for write IO, isn't it?

yes, it is. We reject RWF_ATOMIC for a READ.


This should be added in iomap_dio_bio_opflags() after it is
determined we are doing a write operation. Regardless, it should be
added in iomap_dio_bio_opflags(), not here. That also allows us to
get rid of the is_atomic variable.

ok


+
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
@@ -403,6 +407,12 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (is_atomic && n != orig_count) {
+ /* This bio should have covered the complete length */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto out;
+ }

What happens now if we've done zeroing IO before this? I suspect we
might expose stale data if the partial block zeroing converts the
unwritten extent in full...

We use iomap_dio.ref to ensure that __iomap_dio_rw() does not return until any zeroing and actual sub-io block write completes. See iomap_dio_zero() -> iomap_dio_submit_bio() -> atomic_inc(&dio->ref) callchain. I meant to add such info to the commit message, as you questioned this previously.


if (dio->flags & IOMAP_DIO_WRITE) {
task_io_account_write(n);
} else {

Ignoring the error handling issues, this code might be better as:

if (dio->flags & IOMAP_DIO_WRITE) {
if ((opflags & REQ_ATOMIC) && n != orig_count) {
/* atomic writes are all or nothing */
ret = -EIO
bio_put(bio);
goto out;
}
}

so that we are not putting atomic write error checks in the read IO
submission path.


Maybe, I'll look at a rework with the suggested change to use iomap_dio_bio_opflags() - I actually thought that I introduced a change to use iomap_dio_bio_opflags() previously...

BTW, we need to return -EINVAL, as this is what userspace expects for such an error.

Thanks,
John