Re: processes hung after sys_renameat, and 'missing' processes

From: Eric W. Biederman
Date: Thu Jun 07 2012 - 22:08:31 EST


Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Thu, Jun 07, 2012 at 04:57:13PM -0700, Linus Torvalds wrote:
>
>> Any per-filesystem mutex should do, so if sysfs always holds the
>> sysfs_mutex - and never allows user-initiated renames - it should be
>> safe.
>
> Frankly, I would very much prefer to have the same locking rules wherever
> possible. The locking system is already overcomplicated and making its
> analysis fs-dependent as well... <shudder> Sure, we can do that, and that
> might even work, until we find out that some piece of code that started
> as a helper to some function never called on sysfs dentries had been
> reused on the path that *is* reachable on sysfs. At which point we are
> suddenly in trouble.

Staring at it I see what I was missing. The practical issue is
lock_rename(), and any parts of the vfs that depend on lock_rename().

d_move and the dcache are made safe just by rename_lock. However other
parts of the vfs that care about using d_ancestor are not. I can't
immediately see a case that really cares but I can't rule such a case
out easily either.

> I wouldn't be bothered so much if the overall picture had been simpler;
> unfortunately, it isn't.
>
> Eric, how about this - if nothing else, that makes code in there simpler
> and less dependent on details of VFS guts:
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e6bb9b2..5579826 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -363,7 +363,7 @@ static void sysfs_dentry_iput(struct dentry *dentry, struct inode *inode)
> iput(inode);
> }
>
> -static const struct dentry_operations sysfs_dentry_ops = {
> +const struct dentry_operations sysfs_dentry_ops = {
> .d_revalidate = sysfs_dentry_revalidate,
> .d_delete = sysfs_dentry_delete,
> .d_iput = sysfs_dentry_iput,
> @@ -795,16 +795,8 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
> }
>
> /* instantiate and hash dentry */
> - ret = d_find_alias(inode);
> - if (!ret) {
> - d_set_d_op(dentry, &sysfs_dentry_ops);
> - dentry->d_fsdata = sysfs_get(sd);
> - d_add(dentry, inode);
> - } else {
> - d_move(ret, dentry);
> - iput(inode);
> - }
> -
> + dentry->d_fsdata = sysfs_get(sd);
> + ret = d_materialise_unique(dentry, inode);

I have a small problem with d_materialise_unique. For renames of files
d_materialise_unique calls __d_instantiate_unique. __d_instantiate_unique
does not detect renames of files. Which at least misses the rename
of sysfs symlinks.

Could we put together a d_materialise_unalias for inodes that we know
they always only have one dentry? That I would be happy to use.

I think the reason I would up with my own version was that the dcache
did no provide what I needed and it was just a few lines to code my own.

> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 52c3bdb..c15a7a3 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -68,6 +68,7 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
> }
> root->d_fsdata = &sysfs_root;
> sb->s_root = root;
> + sb->s_d_op = &sysfs_dentry_ops;

I have no problem with this bit. To answer your earlier question s_d_op
predates this code which is why sysfs was not using it.
> return 0;
> }
>
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 661a963..d73c093 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -157,6 +157,7 @@ extern struct kmem_cache *sysfs_dir_cachep;
> */
> extern struct mutex sysfs_mutex;
> extern spinlock_t sysfs_assoc_lock;
> +extern const struct dentry_operations sysfs_dentry_ops;
>
> extern const struct file_operations sysfs_dir_operations;
> extern const struct inode_operations sysfs_dir_inode_operations;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/