Re: [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable

From: Amir Goldstein
Date: Wed Nov 13 2019 - 02:01:56 EST


On Sun, Nov 3, 2019 at 8:52 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> lower_dentry can't go from positive to negative (we have it pinned),
> but it *can* go from negative to positive. So fetching ->d_inode
> into a local variable, doing a blocking allocation, checking that
> now ->d_inode is non-NULL and feeding the value we'd fetched
> earlier to a function that won't accept NULL is not a good idea.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index a905d5f4f3b0..3c2298721359 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -319,7 +319,7 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode)
> static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
> struct dentry *lower_dentry)
> {
> - struct inode *inode, *lower_inode = d_inode(lower_dentry);
> + struct inode *inode, *lower_inode;
> struct ecryptfs_dentry_info *dentry_info;
> struct vfsmount *lower_mnt;
> int rc = 0;
> @@ -339,7 +339,15 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
> dentry_info->lower_path.mnt = lower_mnt;
> dentry_info->lower_path.dentry = lower_dentry;
>
> - if (d_really_is_negative(lower_dentry)) {
> + /*
> + * negative dentry can go positive under us here - its parent is not
> + * locked. That's OK and that could happen just as we return from
> + * ecryptfs_lookup() anyway. Just need to be careful and fetch
> + * ->d_inode only once - it's not stable here.
> + */
> + lower_inode = READ_ONCE(lower_dentry->d_inode);
> +
> + if (!lower_inode) {
> /* We want to add because we couldn't find in lower */
> d_add(dentry, NULL);
> return NULL;

Sigh!

Open coding a human readable macro to solve a subtle lookup race.
That doesn't sound like a scalable solution.
I have a feeling this is not the last patch we will be seeing along
those lines.

Seeing that developers already confused about when they should use
d_really_is_negative() over d_is_negative() [1] and we probably
don't want to add d_really_really_is_negative(), how about
applying that READ_ONCE into d_really_is_negative() and
re-purpose it as a macro to be used when races with lookup are
a concern?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20190903135803.GA25692@hsiangkao-HP-ZHAN-66-Pro-G1/