Re: [PATCH] fix race in mark_mounts_for_expiry()

From: Miklos Szeredi
Date: Thu May 19 2005 - 07:45:08 EST


> > Define unreachable.
>
> Unreachable as in no file descriptors (or chroot/cwd) refer to the
> vfsmnt, either directly or indirectly through a path traversal.
>
> > Then define a mechanism, by which it can be detected.
>
> There aren't any vfsmnt->vfsmnt cycles... They're a forest, vfsmnts
> don't move from one tree to another (bind mounts don't link them, they
> create new vfsmnts), and each tree can be referenced by a file
> descriptor at any point on the tree.
>
> It rather hinges on which of these behaviours you prefer:
>
> 1. A file descriptor/chroot/cwd reference to any point in a vfsmnt
> tree means the whole tree is retained. This means ".." remains
> always accessible: fchdir(fd); open("..") continues to access
> that whole tree as you still have fd.
>
> 2. A file descriptor/chroot/cwd reference to any point in a vfsmnt
> tree means the subtree from that point is retained, and parents
> may disappear if there are no references (not counting ".." as a
> reference). This behaviour is more sensible for chroots, where
> the parents should be inaccessible anyway.
>
> 3. A mixture, where current->root references only maintain the
> subtree rooted at that point, and other references, if outside
> the current->root subtree, retain the whole tree accessible from
> those references.

4. Everything stays the same, except chroot() changes
current->namespace to current->fs->root_mnt->namespace. This would
address your concern with chroot.

> The appropriate data structure / algorithm depends on which
> behaviour is preferred. So which is it?

My preference is 1) or 4).

2) and 3) would have some pretty strange properties (in some
situations you can enter a directory, but you cannot get out with 'cd
..' any more)

> 1 Is best done with a mnt_namespace structure, but references to it
> counted when vfsmnts are referenced by file descriptors/root/cwd,
> _not_ references by tasks (no current->namespace).

So you are proposing selective replacement of

mntget(mnt);

with

mntget(mnt);
get_namespace(mnt->mnt_namespace);

and similar for mntput()?

What do you do on unmount? If you set mnt->mnt_namespace to NULL,
you'd have to do a put_namespace() for each file descriptor/root/cwd
referencing the mount, but how do you do that?

> 2 is best done by simply reference counting vfsmnts.

It would need quite a revamp of the current reference counting. If
I'm not mistaken, currently each vfsmount has an initial reference
given to it in alloc_vfsmnt. And on attach, the parent namespace's
count is also increased, and vice-versa for unmount.

I'm not sure what purpose referencing the parent serves, but it won't
work with 2). There you'd need referencing the other way round:
parents holding a reference on their children, and have no initial
reference.

With 2) it's also questionable, how you copy the namespace in clone().
Do you just copy the tree under current->fs->root_mnt? Or do you copy
tree under cwd as well (if it's disjunct from the root tree)?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/