Re: [External] Re: [PATCH v2] fuse: fix deadlock between atomic O_TRUNC open() and page invalidations

From: Jiachen Zhang
Date: Fri Mar 04 2022 - 01:23:24 EST


On Wed, Feb 23, 2022 at 4:50 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Wed, Dec 29, 2021 at 12:02:39PM +0800, Jiachen Zhang wrote:
> > fuse_finish_open() will be called with FUSE_NOWRITE set in case of atomic
> > O_TRUNC open(), so commit 76224355db75 ("fuse: truncate pagecache on
> > atomic_o_trunc") replaced invalidate_inode_pages2() by truncate_pagecache()
> > in such a case to avoid the A-A deadlock. However, we found another A-B-B-A
> > deadlock related to the case above, which will cause the xfstests
> > generic/464 testcase hung in our virtio-fs test environment.
> >
> > For example, consider two processes concurrently open one same file, one
> > with O_TRUNC and another without O_TRUNC. The deadlock case is described
> > below, if open(O_TRUNC) is already set_nowrite(acquired A), and is trying
> > to lock a page (acquiring B), open() could have held the page lock
> > (acquired B), and waiting on the page writeback (acquiring A). This would
> > lead to deadlocks.
> >
> > open(O_TRUNC)
> > ----------------------------------------------------------------
> > fuse_open_common
> > inode_lock [C acquire]
> > fuse_set_nowrite [A acquire]
> >
> > fuse_finish_open
> > truncate_pagecache
> > lock_page [B acquire]
> > truncate_inode_page
> > unlock_page [B release]
> >
> > fuse_release_nowrite [A release]
> > inode_unlock [C release]
> > ----------------------------------------------------------------
> >
> > open()
> > ----------------------------------------------------------------
> > fuse_open_common
> > fuse_finish_open
> > invalidate_inode_pages2
> > lock_page [B acquire]
> > fuse_launder_page
> > fuse_wait_on_page_writeback [A acquire & release]
> > unlock_page [B release]
> > ----------------------------------------------------------------
> >
> > Besides this case, all calls of invalidate_inode_pages2() and
> > invalidate_inode_pages2_range() in fuse code also can deadlock with
> > open(O_TRUNC). This commit tries to fix it by adding a new lock,
> > atomic_o_trunc, to protect the areas with the A-B-B-A deadlock risk.
>
> Thanks. Can you please try the following patch? Instead of introducing a new
> lock it tries to fix this by moving the truncate_pagecache() out of the nowrite
> protected section.
>
> Thanks,
> Miklos
>
> ---
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 656e921f3506..56f439719129 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -537,6 +537,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> struct fuse_file *ff;
> void *security_ctx = NULL;
> u32 security_ctxlen;
> + bool trunc = flags & O_TRUNC;
>
> /* Userspace expects S_IFREG in create mode */
> BUG_ON((mode & S_IFMT) != S_IFREG);
> @@ -561,7 +562,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> inarg.mode = mode;
> inarg.umask = current_umask();
>
> - if (fm->fc->handle_killpriv_v2 && (flags & O_TRUNC) &&
> + if (fm->fc->handle_killpriv_v2 && trunc &&
> !(flags & O_EXCL) && !capable(CAP_FSETID)) {
> inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
> }
> @@ -623,6 +624,8 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> } else {
> file->private_data = ff;
> fuse_finish_open(inode, file);
> + if (fm->fc->atomic_o_trunc && trunc)
> + truncate_pagecache(inode, 0);
> }
> return err;
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 829094451774..2e041708ef44 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -210,7 +210,6 @@ void fuse_finish_open(struct inode *inode, struct file *file)
> fi->attr_version = atomic64_inc_return(&fc->attr_version);
> i_size_write(inode, 0);
> spin_unlock(&fi->lock);
> - truncate_pagecache(inode, 0);
> file_update_time(file);
> fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
> } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) {
> @@ -239,30 +238,32 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
> if (err)
> return err;
>
> - if (is_wb_truncate || dax_truncate) {
> + if (is_wb_truncate || dax_truncate)
> inode_lock(inode);
> - fuse_set_nowrite(inode);
> - }
>
> if (dax_truncate) {
> filemap_invalidate_lock(inode->i_mapping);
> err = fuse_dax_break_layouts(inode, 0, 0);
> if (err)
> - goto out;
> + goto out_inode_unlock;
> }
>
> + if (is_wb_truncate || dax_truncate)
> + fuse_set_nowrite(inode);
> +
> err = fuse_do_open(fm, get_node_id(inode), file, isdir);
> if (!err)
> fuse_finish_open(inode, file);
>
> -out:
> + if (is_wb_truncate | dax_truncate)
> + fuse_release_nowrite(inode);
> + if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC))
> + truncate_pagecache(inode, 0);
> if (dax_truncate)
> filemap_invalidate_unlock(inode->i_mapping);
> -
> - if (is_wb_truncate | dax_truncate) {
> - fuse_release_nowrite(inode);
> +out_inode_unlock:
> + if (is_wb_truncate | dax_truncate)
> inode_unlock(inode);
> - }
>
> return err;
> }

Hi, Miklos,

I tested this fix, and it did pass the xfstests generic/464 in our
environment. However, if I understand correctly, one of the usages of
the nowrite is to protect file truncation, as said in the commit
message of e4648309b85a78f8c787457832269a8712a8673e. So, does that
mean this fix may introduce some other problems?

Thanks,
Jiachen