Re: [RFC][PATCH] inotify 0.10.0

From: Robert Love
Date: Mon Sep 27 2004 - 16:05:23 EST


On Sun, 2004-09-26 at 21:17 -0700, Andrew Morton wrote:

> Please raise patches against current kernels from
> ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots. This kernel is six
> weeks old.

I have patches against newer kernels at

http://www.kernel.org/pub/linux/kernel/people/rml/inotify

Anyhow, the tidal flood of commentary is appreciated. I have addressed
all of your issues ...

> > +#define INOTIFY_VERSION "0.10.0"
>
> You should plan to remove this - it becomes meainingless once the code
> is merged up, and nobody ever updates it as they patch things.

Agreed. Patch sent.

> +#define INOTIFY_DEV_TIMER_TIME (jiffies + (HZ/4))
>
> ick. Don't hide the logic in a #define.
>
> static inline arm_dev_timer(struct inotify_device *dev)
> {
> mod_timer(&dev->timer, jiffies + HZ/4);
> }
>
> is nicer.

It is used in more than one place, but OK. Patch sent.

> > +/* For debugging */
> > +static int event_object_count;
> > +static int watcher_object_count;
> > +static int inode_ref_count;
>
> OK. These are accessed racily. Either make them atomic_t's or remove them.

They are just statistics. I'd prefer to remove them entirely (I don't
personally think that the debugging code, statistics, etc. should go
into the mainline kernel).

> > +static int find_inode(const char __user *dirname, struct inode **inode)
>
> This can just return an inode*, or an IS_ERR() errno, I think?

Yes, it can. Done, patch sent.

> > +struct inotify_kernel_event *kernel_event(int wd, int mask,
> > + const char *filename)
> > +{
> > + struct inotify_kernel_event *kevent;
> > +
> > + kevent = kmem_cache_alloc(kevent_cache, GFP_ATOMIC);
>
> Try to rearrange things so the allocation mode here can become GFP_KERNEL.

Hrmph.

> Are there ever enough of these objects in flight to justify a standalone
> slab cache?

Yes. There are up to 256*8 possible events and (more importantly than
the net number, I think) they come and go constantly.

> > + watcher = kmem_cache_alloc(watcher_cache, GFP_KERNEL);

Yes. There can be a lot of watches... one of the intended uses of this
is automatic indexing of a user's homedir (think Apple Spotlight). Alan
Cox has also mentioned virus checking, etc.

There are structures are _not_ used as much, and we do not slab cache
them.

> i_lock is documented as an "innermost" lock.
>
> But here, dev->lock is nesting inside it. Not necessarily a bug per-se,
> but it changes and complicates kernel locking rules, and invalidates
> commentary elsewhere.
>
> If possible, please try to avoid using i_lock. Use i_sem instead.
>
> A bug, I think. What happens if another CPU comes in and tries to take these
> two locks in the opposite order?
>
> The same problem applies if you switch to i_sem. The standard fix is to
> take the lowest-addressed lock first. See d_move() for an example.

I'll work on the locking.

> I'll be merging invalidate_inodes-speedup.patch once the 2.6.10 stream
> opens. That will make the above code simpler, faster and quite different.

Sweet.

> > + add_wait_queue(&dev->wait, &wait);
> > +repeat:
> > + if (signal_pending(current)) {
> > + spin_unlock(&dev->lock);
> > + out = -ERESTARTSYS;
> > + set_current_state(TASK_RUNNING);
> > + remove_wait_queue(&dev->wait, &wait);
> > + goto out;
> > + }
> > + set_current_state(TASK_INTERRUPTIBLE);
> > + if (!inotify_dev_has_events(dev)) {
> > + spin_unlock(&dev->lock);
> > + schedule();
> > + spin_lock(&dev->lock);
> > + goto repeat;
> > + }
> > +
> > + set_current_state(TASK_RUNNING);
> > + remove_wait_queue(&dev->wait, &wait);
>
> The above seems a bit clumsy.

John is reworking inotify_read(), which should take care of the various
issues you raised here.

> > +static int inotify_open(struct inode *inode, struct file *file)
> > +{
> > + struct inotify_device *dev;
> > +
> > + if (atomic_read(&watcher_count) == MAX_INOTIFY_DEVS)
> > + return -ENODEV;
> > +
> > + atomic_inc(&watcher_count);
> > +
> > + dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
> > + if (!dev)
> > + return -ENOMEM;
> > +
> > + memset(dev->bitmask, 0,
> > + sizeof(unsigned long) * MAX_INOTIFY_DEV_WATCHERS / BITS_PER_LONG);
>
> What purpose does this bitmask serve, anyway??

Bitmask of allocated/unallocated watcher descriptors.

Patch sent to add a comment. Also sent a patch to use the bitmap.h
functions instead of this open-coded memset().

> > +static void inotify_release_all_watchers(struct inotify_device *dev)
> > +{
> > + struct inotify_watcher *watcher,*next;
> > +
> > + list_for_each_entry_safe(watcher, next, &dev->watchers, d_list)
> > + ignore_helper(watcher, 0);
> > +}
>
> Locking?
>
> > +
> > +static int inotify_release(struct inode *inode, struct file *file)
> > +{
> > + if (file->private_data) {
>
> Why test ->private_data here?

I'm not sure why. I don't think we ought to - release functions are
only called once, when the last ref on the file dies. I asked John and
sent a patch to remove it.

> If it indeed needs testing here, shouldn't it be zeroed out as well, with
> appropriate locking?
>
> > + struct inotify_device *dev;
> > +
> > + dev = (struct inotify_device *) file->private_data;
>
> Please don't typecast when assigning to and from void*'s

Nod. Patch sent.

> > + del_timer_sync(&dev->timer);
> > + inotify_release_all_watchers(dev);
> > + inotify_release_all_events(dev);
> > + kfree(dev);
> > + }
> > +
> > + printk(KERN_ALERT "inotify device released\n");
> > +
> > + atomic_dec(&watcher_count);
>
> If file->private_data was zero, we shouldn't have decremented this?

I don't think we should test file->private_data at all, so after fixing
that, this is fixed.

> +static int inotify_watch(struct inotify_device *dev,
> + struct inotify_watch_request *request)
> > +{
> > ...
> > + spin_lock(&dev->lock);
> > + spin_lock(&inode->i_lock);
>
> Lock ranking.
>
> > +static int inotify_ioctl(struct inode *ip, struct file *fp,
> > + unsigned int cmd, unsigned long arg) {
>
> An errant brace!

Patch sent.

> > +
> > + if (_IOC_DIR(cmd) & _IOC_READ)
> > + err = !access_ok(VERIFY_READ, (void *) arg, _IOC_SIZE(cmd));
> > +
> > + if (err)
> > + err = -EFAULT;
> > + goto out;
> > +
>
> eh? The above is missing braces, and cannot possibly have worked.

That is my fault, just introduced in the latest revision.

> > + if (_IOC_DIR(cmd) & _IOC_WRITE)
> > + err = !access_ok(VERIFY_WRITE, (void *)arg, _IOC_SIZE(cmd));
> > +
> > + if (err) {
> > + err = -EFAULT;
> > + goto out;
> > + }
>
> Why are these access_ok() checks here? I think they can (should) go away.

They should. We should just use copy_{to,from}_user. Will fix.

> We often do:
>
> switch (cmd) {
> case INOTIFY_WATCH:
>
> to save a tabstop.

Yah. I thought I included that in my coding style cleanup. Patch sent.

> > +struct miscdevice inotify_device = {
> > + .minor = MISC_DYNAMIC_MINOR,
> > + .name = "inotify",
> > + .fops = &inotify_fops,
> > +};
>
> Please update devices.txt

Documentation/devices.txt doesn't really have provisions for dynamically
allocated devices, which this is.

We _could_ take a fixed minor...

> > +struct inotify_event {
> > + int wd;
> > + int mask;
> > + int cookie;
> > + char filename[INOTIFY_FILENAME_MAX];
> > +};
>
> yeah, that's not very nice. Better to kmalloc the pathname.

That is the structure that we communicate with to user-space.

We could kmalloc() filename, but it makes the user-space use a bit more
complicated (and right now it is trivial and wonderfully simple).

We've been debating the pros and cons.

> Please add CONFIG_INOTIFY and make all this:
>
> [...]
>
> go away if the user doesn't want inotify. And remember to test with
> CONFIG_INOTIFY=n!

Done. Patch sent.

> > + INIT_LIST_HEAD (&inode->watchers);
>
> Please review the entire patch and ensure that all macros and function
> calls have no space between the identifier and the opening parenthesis.

I thought I caught them all - guess not, patch sent.

Thanks for the feedback.

Robert Love


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/