Re: [PATCH v2] ceph: invalidate pages when doing DIO in encrypted inodes

From: Xiubo Li
Date: Wed Apr 06 2022 - 10:32:00 EST



On 4/6/22 6:50 PM, Luís Henriques wrote:
Xiubo Li <xiubli@xxxxxxxxxx> writes:

On 4/1/22 9:32 PM, Luís Henriques wrote:
When doing DIO on an encrypted node, we need to invalidate the page cache in
the range being written to, otherwise the cache will include invalid data.

Signed-off-by: Luís Henriques <lhenriques@xxxxxxx>
---
fs/ceph/file.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

Changes since v1:
- Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
- Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO

Note: I'm not really sure this last change is required, it doesn't really
affect generic/647 result, but seems to be the most correct.

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5072570c2203..b2743c342305 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1605,7 +1605,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
if (ret < 0)
return ret;
- ceph_fscache_invalidate(inode, false);
+ ceph_fscache_invalidate(inode, (iocb->ki_flags & IOCB_DIRECT));
ret = invalidate_inode_pages2_range(inode->i_mapping,
pos >> PAGE_SHIFT,
(pos + count - 1) >> PAGE_SHIFT);
@@ -1895,6 +1895,15 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
req->r_inode = inode;
req->r_mtime = mtime;
+ if (IS_ENCRYPTED(inode) && (iocb->ki_flags & IOCB_DIRECT)) {
+ ret = invalidate_inode_pages2_range(
+ inode->i_mapping,
+ write_pos >> PAGE_SHIFT,
+ (write_pos + write_len - 1) >> PAGE_SHIFT);
+ if (ret < 0)
+ dout("invalidate_inode_pages2_range returned %d\n", ret);
+ }
Shouldn't we fail it if the 'invalidate_inode_pages2_range()' fails here ?
Yeah, I'm not really sure. I'm simply following the usual pattern where
an invalidate_inode_pages2_range() failure is logged and ignored. And
this is not ceph-specific, other filesystems seem to do the same thing.

I think it should be they are using this to invalidate the range only, do not depend on it to writeback the dirty pages.

Such as they may will call 'filemap_fdatawrite_range()', etc.

I saw in the beginning of the 'ceph_sync_write()', it will do 'filemap_write_and_wait_range()' too. So the dirty pages should have already flushed.

-- Xiubo



Cheers,