Re: [PATCH 4/4] iomap: cleanup iomap_write_iter()

From: Darrick J. Wong
Date: Mon Mar 11 2024 - 12:07:50 EST


On Mon, Mar 11, 2024 at 08:22:55PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>
> The status variable in iomap_write_iter() is confusing and
> iomap_write_end() always return 0 or copied bytes, so replace it with a
> new written variable to represent the written bytes in each cycle, and
> also do some cleanup, no logic changes.
>
> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> ---
> fs/iomap/buffered-io.c | 31 +++++++++++++++----------------
> 1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 19f91324c690..767af6e67ed4 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -851,7 +851,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> loff_t length = iomap_length(iter);
> size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
> loff_t pos = iter->pos;
> - ssize_t written = 0;
> + ssize_t total_written = 0;
> long status = 0;
> struct address_space *mapping = iter->inode->i_mapping;
> unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
> @@ -862,6 +862,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> size_t offset; /* Offset into folio */
> size_t bytes; /* Bytes to write to folio */
> size_t copied; /* Bytes copied from user */
> + size_t written; /* Bytes have been written */
>
> bytes = iov_iter_count(i);
> retry:
> @@ -906,7 +907,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> flush_dcache_folio(folio);
>
> copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> - status = iomap_write_end(iter, pos, bytes, copied, folio);
> + written = iomap_write_end(iter, pos, bytes, copied, folio);
>
> /*
> * Update the in-memory inode size after copying the data into
> @@ -915,28 +916,26 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> * no stale data is exposed.
> */
> old_size = iter->inode->i_size;
> - if (pos + status > old_size) {
> - i_size_write(iter->inode, pos + status);
> + if (pos + written > old_size) {
> + i_size_write(iter->inode, pos + written);
> iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
> }
> - __iomap_put_folio(iter, pos, status, folio);
> + __iomap_put_folio(iter, pos, written, folio);
>
> if (old_size < pos)
> pagecache_isize_extended(iter->inode, old_size, pos);
> - if (status < bytes)
> - iomap_write_failed(iter->inode, pos + status,
> - bytes - status);
> - if (unlikely(copied != status))
> - iov_iter_revert(i, copied - status);

I wish you'd made the variable renaming and the function reorganization
separate patches. The renaming looks correct to me, but moving these
calls adds a logic bomb.

If at some point iomap_write_end actually starts returning partial write
completions (e.g. you wrote 250 bytes, but for some reason the pagecache
only acknowledges 100 bytes were written) then this code no longer
reverts the iter or truncates posteof pagecache correctly...

>
> cond_resched();
> - if (unlikely(status == 0)) {
> + if (unlikely(written == 0)) {
> /*
> * A short copy made iomap_write_end() reject the
> * thing entirely. Might be memory poisoning
> * halfway through, might be a race with munmap,
> * might be severe memory pressure.
> */
> + iomap_write_failed(iter->inode, pos, bytes);
> + iov_iter_revert(i, copied);

..because now we only do that if the pagecache refuses to acknowledge
any bytes written at all. I think it actually works correctly with
today's kernel since __iomap_write_end only returns copied or 0, but the
size_t return type implies that a short acknowledgement is theoretically
possible.

IOWs, doesn't this adds a logic bomb?

--D

> +
> if (chunk > PAGE_SIZE)
> chunk /= 2;
> if (copied) {
> @@ -944,17 +943,17 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> goto retry;
> }
> } else {
> - pos += status;
> - written += status;
> - length -= status;
> + pos += written;
> + total_written += written;
> + length -= written;
> }
> } while (iov_iter_count(i) && length);
>
> if (status == -EAGAIN) {
> - iov_iter_revert(i, written);
> + iov_iter_revert(i, total_written);
> return -EAGAIN;
> }
> - return written ? written : status;
> + return total_written ? total_written : status;
> }
>
> ssize_t
> --
> 2.39.2
>
>