Re: [PATCH v4] ceph: invalidate pages when doing direct/sync writes

From: Xiubo Li
Date: Thu Apr 07 2022 - 22:47:53 EST



On 4/7/22 11:15 PM, Luís Henriques wrote:
When doing a direct/sync write, we need to invalidate the page cache in
the range being written to. If we don't do this, the cache will include
invalid data as we just did a write that avoided the page cache.

Signed-off-by: Luís Henriques <lhenriques@xxxxxxx>
---
fs/ceph/file.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

Changes since v3:
- Dropped initial call to invalidate_inode_pages2_range()
- Added extra comment to document invalidation

Changes since v2:
- Invalidation needs to be done after a write

Changes since v1:
- Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
- Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5072570c2203..97f764b2fbdd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1606,11 +1606,6 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
return ret;
ceph_fscache_invalidate(inode, false);
- ret = invalidate_inode_pages2_range(inode->i_mapping,
- pos >> PAGE_SHIFT,
- (pos + count - 1) >> PAGE_SHIFT);
- if (ret < 0)
- dout("invalidate_inode_pages2_range returned %d\n", ret);
while ((len = iov_iter_count(from)) > 0) {
size_t left;
@@ -1938,6 +1933,20 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
break;
}
ceph_clear_error_write(ci);
+
+ /*
+ * we need to invalidate the page cache here, otherwise the
+ * cache will include invalid data in direct/sync writes.
+ */
+ ret = invalidate_inode_pages2_range(
+ inode->i_mapping,
+ pos >> PAGE_SHIFT,
+ (pos + len - 1) >> PAGE_SHIFT);
+ if (ret < 0) {
+ dout("invalidate_inode_pages2_range returned %d\n",
+ ret);
+ ret = 0;

For this, IMO it's not safe. If we just ignore it the pagecache will still have invalid data.

I think what the 'ceph_direct_read_write()' does is more correct, it will make sure all the dirty pages are writeback from the pagecaches by using 'invalidate_inode_pages2_range()' without blocking and later will do the invalidate blocked by using 'truncate_inode_pages_range()' if some pages are not unmaped in 'invalidate_inode_pages2_range()' when EBUSY.

This can always be sure that the pagecache has no invalid data after write finishes. I think why it use the truncate helper here is because it's safe and there shouldn't have any buffer write happen for DIO ?

But from my understanding the 'ceph_direct_read_write()' is still buggy. What if the page fault happen just after 'truncate_inode_pages_range()' ? Will this happen ? Should we leave this to use the file lock to guarantee it in user space ?

Thought ?

-- Xiubo

+ }
pos += len;
written += len;
dout("sync_write written %d\n", written);