Re: [PATCH] fix race in mark_mounts_for_expiry()

From: Jamie Lokier
Date: Wed May 18 2005 - 12:37:46 EST


Miklos Szeredi wrote:
> > - Causes every mount in a mount tree to be detached (independently),
> > when last task associated with a namespace is destroyed.
>
> I don't understand. The tree _has_ to be detached when no task uses
> the namespace. That is the main purpose of the namespace structure,
> to provide an anchor for the mount tree.

That makes less sense if we allow other tasks to be using a namespace
through a passing a file descriptor, and then the last task which has
current->namespace equal to that namespace exits. It makes no sense
to me that the mount which is still accessible through the file
descriptor is suddenly detached from it's parent and children mounts.

Why is it not good enough to detach each vfsmnt when the last
reference to each vfsmnt is dropped? In other words, simply when the
vfsmnt becomes unreachable?

That would detach whole vfsmnt trees when the last reference to the
root of the tree is dropped, rather than when a task exits even though
another tasks has a file descriptor still pointing to the tree.

That would be a change of behaviour, but it seems like quite a
sensible change in behaviour, and would not affect "normal" namespace
users.

There's one other purpose to mnt_namespace:

- The list in /proc/mounts.

But really it would *improve* security if the list in /proc/mounts was
derived by walking the vfsmnt subtree starting at current->chroot.

> > And this invisible effect:
> >
> > - More concurrency than a global mount lock would have.
>
> This is the key issue I think. It may even have security implications
> in the future if we want to allow unprivileged mounts : a user could
> DoS the system by just doing lots of mounts umounts in a private
> namespace.

Not an argument. If that's a problem, they could easily DoS the
system by doing lots of mount/umounts in the initial namespace.

The proper thing to do about concurrency is to make sure the slow
operation (getting the superblock) is done outside any mount
semaphore, and just do the tree traversal/splicing operations inside
it. The code seems to do that already, so I think a global semaphore
instead of mnt_namespace->sem would make no practical difference.

But if it was a problem, we'd come up with something. Let's
concentrate on the behaviour we want, rather than a minor detail like
that.

I think the common sense behaviour is to say that vfsmnts remain
mounted as long as they are reachable - from a file descriptor, task's
root, or task's cwd.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/