Re: Race between "mount" uevent and /proc/mounts?

From: Sergey Vlasov
Date: Thu Nov 03 2005 - 05:53:16 EST


On Thu, Nov 03, 2005 at 08:07:13AM +0000, Al Viro wrote:
> On Wed, Nov 02, 2005 at 04:01:18PM +0300, Sergey Vlasov wrote:
> > @@ -120,6 +122,10 @@ static void detach_mnt(struct vfsmount *
> > list_del_init(&mnt->mnt_child);
> > list_del_init(&mnt->mnt_hash);
> > old_nd->dentry->d_mounted--;
> > + if (current->namespace) {
> > + current->namespace->event++;
> > + wake_up_interruptible(&mounts_wait);
> > + }
> > }
> >
> > static void attach_mnt(struct vfsmount *mnt, struct nameidata *nd)
>
> Ugh... So umount -l gives one hell of a spew for no good reason.

umount -l will change contents of /proc/mounts, so waking up poll() on
that file seems to be right in this case (even if the filesystem is still
mounted internally, it is no longer accessible).

> > @@ -129,6 +135,8 @@ static void attach_mnt(struct vfsmount *
> > list_add(&mnt->mnt_hash, mount_hashtable+hash(nd->mnt, nd->dentry));
> > list_add_tail(&mnt->mnt_child, &nd->mnt->mnt_mounts);
> > nd->dentry->d_mounted++;
> > + current->namespace->event++;
> > + wake_up_interruptible(&mounts_wait);
> > }
>
> Bad idea - copy_tree() will spew *and* we get bogus events on CLONE_NEWNS
> (i.e. current->namespace is not even the namespace being modified).

IMHO it's not spew, but real changes in the mount tree.

CLONE_NEWNS handling may really be broken (maybe mnt->mnt_namespace should
be used instead of current->namespace, but I'm not sure if it is set
correctly at this place - it is certainly wrong in detach_mnt()).

> > @@ -1093,6 +1104,7 @@ int copy_namespace(int flags, struct tas
> > atomic_set(&new_ns->count, 1);
> > init_rwsem(&new_ns->sem);
> > INIT_LIST_HEAD(&new_ns->list);
> > + new_ns->event = 0;
>
> BTW, I'd rather make that queue per-namespace...

You mean mount_wait, so that only tasks which wait for changes in a
particular namespace would be woken up? Yes, that would be better (if
namespaces are really used).

> > + down_read(&namespace->sem);
> > + if (private->last_event != namespace->event) {
> > + private->last_event = namespace->event;
> > + ret = POLLIN | POLLRDNORM;
>
> Umm... I'd rather use POLLERR, since POLLIN doesn't apply here - it's not
> a stream of data that gives blocking read() when reached the end.

This is copied from /proc/bus/usb/devices, which has similar behavior.

Attachment: pgp00000.pgp
Description: PGP signature