Re: livelock avoidance in sget()

From: Greg KH
Date: Sat Jul 20 2013 - 16:30:12 EST


Hi Al,

Is the patch below something that we need to worry about for 3.10 and
older kernels? Or does the recent changes to the vfs in 3.11-rc1 make
it so that this can't be hit in older kernels?

thanks,

greg k-h

On Sat, Jul 20, 2013 at 07:34:27PM +0000, Linux Kernel Mailing List wrote:
> Gitweb: http://git.kernel.org/linus/;a=commit;h=acfec9a5a892f98461f52ed5770de99a3e571ae2
> Commit: acfec9a5a892f98461f52ed5770de99a3e571ae2
> Parent: ba57ea64cb1820deb37637de0fdb107f0dc90089
> Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> AuthorDate: Sat Jul 20 03:13:55 2013 +0400
> Committer: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> CommitDate: Sat Jul 20 04:58:58 2013 +0400
>
> livelock avoidance in sget()
>
> Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
> to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive,
> ->s_active is 1. Along comes two more processes, trying to mount the same
> thing; sget() in each is picking that superblock, bumping ->s_count and
> trying to grab ->s_umount. ->s_active is 3 now. Original mount(2)
> finally gets to deactivate_locked_super() on failure; ->s_active is 2,
> superblock is still ->fs_supers because shutdown will *not* happen until
> ->s_active hits 0. ->s_umount is dropped and now we have two processes
> chasing each other:
> s_active = 2, A acquired ->s_umount, B blocked
> A sees that the damn thing is stillborn, does deactivate_locked_super()
> s_active = 1, A drops ->s_umount, B gets it
> A restarts the search and finds the same superblock. And bumps it ->s_active.
> s_active = 2, B holds ->s_umount, A blocked on trying to get it
> ... and we are in the earlier situation with A and B switched places.
>
> The root cause, of course, is that ->s_active should not grow until we'd
> got MS_BORN. Then failing ->mount() will have deactivate_locked_super()
> shut the damn thing down. Fortunately, it's easy to do - the key point
> is that grab_super() is called only for superblocks currently on ->fs_supers,
> so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
> bump ->s_active; we must never increment ->s_count for superblocks past
> ->kill_sb(), but grab_super() is never called for those.
>
> The bug is pretty old; we would've caught it by now, if not for accidental
> exclusion between sget() for block filesystems; the things like cgroup or
> e.g. mtd-based filesystems don't have anything of that sort, so they get
> bitten. The right way to deal with that is obviously to fix sget()...
>
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---
> fs/super.c | 25 ++++++++++---------------
> 1 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 7465d43..68307c0 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -336,19 +336,19 @@ EXPORT_SYMBOL(deactivate_super);
> * and want to turn it into a full-blown active reference. grab_super()
> * is called with sb_lock held and drops it. Returns 1 in case of
> * success, 0 if we had failed (superblock contents was already dead or
> - * dying when grab_super() had been called).
> + * dying when grab_super() had been called). Note that this is only
> + * called for superblocks not in rundown mode (== ones still on ->fs_supers
> + * of their type), so increment of ->s_count is OK here.
> */
> static int grab_super(struct super_block *s) __releases(sb_lock)
> {
> - if (atomic_inc_not_zero(&s->s_active)) {
> - spin_unlock(&sb_lock);
> - return 1;
> - }
> - /* it's going away */
> s->s_count++;
> spin_unlock(&sb_lock);
> - /* wait for it to die */
> down_write(&s->s_umount);
> + if ((s->s_flags & MS_BORN) && atomic_inc_not_zero(&s->s_active)) {
> + put_super(s);
> + return 1;
> + }
> up_write(&s->s_umount);
> put_super(s);
> return 0;
> @@ -463,11 +463,6 @@ retry:
> destroy_super(s);
> s = NULL;
> }
> - down_write(&old->s_umount);
> - if (unlikely(!(old->s_flags & MS_BORN))) {
> - deactivate_locked_super(old);
> - goto retry;
> - }
> return old;
> }
> }
> @@ -660,10 +655,10 @@ restart:
> if (hlist_unhashed(&sb->s_instances))
> continue;
> if (sb->s_bdev == bdev) {
> - if (grab_super(sb)) /* drops sb_lock */
> - return sb;
> - else
> + if (!grab_super(sb))
> goto restart;
> + up_write(&sb->s_umount);
> + return sb;
> }
> }
> spin_unlock(&sb_lock);
> --
> To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/