Re: [patch] stop inotify from sending random DELETE_SELF eventunder load

From: John McCutchan
Date: Tue Sep 20 2005 - 20:41:41 EST


On Wed, 2005-09-21 at 02:01 +0100, Al Viro wrote:
> On Tue, Sep 20, 2005 at 06:53:34PM -0400, John McCutchan wrote:
> > Is there some reason we can't just do this from vfs_unlink
> >
> > inode = dentry->inode;
> > iget (inode);
> > d_delete (dentry);
> > fsnotify_inoderemove (inode);
> > iput (inode);
> >
> > This would allow us to have immediate event notification, and avoid a
> > race with the inode going away, right?
>
> Playing with references to struct inode means playing dirty tricks
> behind the filesystem's back. Doing that in a way that really changes
> inode lifetime means asking for trouble. Combined with a dirty trick
> *already* pulled by sys_unlink() to postpone the final iput until after
> we unlock the parent, it means breakage (and aforementioned dirty trick
> took some rather interesting logics to compensate for in the first place).
>
> Moreover, your suggestion would do that to _everyone_, whether they use
> inotify or not. NAK.

Got it.

>
> > static inline void fsnotify_inoderemove(struct inode *inode)
> > {
> > - inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
> > - inotify_inode_is_dead(inode);
> > + inotify_inode_queue_event(inode, IN_DELETE_SELF, inode->i_nlink, NULL);
> > + if (inode->i_nlink == 0)
> > + inotify_inode_is_dead(inode);
> > }
>
> Assumes that filesystem treats ->i_nlink on final iput() in usual way.
> It doesn't have to.
>

I grepped all the filesystems, and they all seem to use
generic_drop_inode, except for hugetlbfs, which seems to have the same
logic of (!inode->i_nlink).

> BTW, what happens if one uses inotify on procfs? Or sysfs, for that matter?
> Fundamental problem with that sucker is that you are playing games with
> lifetime rules of inodes in a way that might be OK for some filesystems,
> but violates a lot of assumptions made by other...
>

Honestly, I don't know. And I don't think I know enough to say with any
certainty how either of them would work. Would a black list of
filesystems that don't want inotify on them be acceptable?

> BTW^2, what guarantees that inotify_unmount_inodes() will not happen while we
> are in inotify_release()? That would happily keep watch refcount bumped,
> so it would outlive inotify_unmount_inodes(). Sure, it would be dropped.
> And call iput() on a pinned inode that had outlived the umount(). Oops...

Good catch,

Index: linux/fs/inotify.c
===================================================================
--- linux.orig/fs/inotify.c 2005-08-31 15:41:11.000000000 -0400
+++ linux/fs/inotify.c 2005-09-20 21:18:35.000000000 -0400
@@ -756,6 +756,7 @@
* do not know the inode until we iterate to the watch. But we need to
* hold inode->inotify_sem before dev->sem. The following works.
*/
+ down(&iprune_sem);
while (1) {
struct inotify_watch *watch;
struct list_head *watches;
@@ -779,6 +780,7 @@
up(&inode->inotify_sem);
put_inotify_watch(watch);
}
+ up(&iprune_sem);

/* destroy all of the events on this device */
down(&dev->sem);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/