Re: [patch] inotify.

From: Neil Brown
Date: Mon Jun 20 2005 - 20:02:29 EST

On Friday June 17, rml@xxxxxxxxxx wrote:
> On Fri, 2005-06-17 at 17:40 -0400, Robert Love wrote:
> > I wrote this up the first time you asked:
> >
> > Documentation/filesystems/inotify.txt
> If there is something I should add, just holler. I will write an
> encyclopedia if need be.

I haven't yet decided if I like it or not, but maybe I have some
useful comments to make...

> +
> +Q: What is the design decision behind using an-fd-per-device as
> +opposed to an fd-per-watch?
> +
> +A: An fd-per-watch quickly consumes more file descriptors than
> +are allowed, more fd's than are feasible to manage, and more
> +fd's than are ideally select()-able. Yes, root can bump the
> +per-process fd limit and yes, users can use epoll, but requiring
> +both is silly and an extraneous requirement. A watch consumes
> +less memory than an open file, separating the number spaces is
> +thus sensible. The current design is what user-space developers
> +want: Users open the device, once, and add n watches, requiring
> +but one fd and no twiddling with fd limits.
> +Opening /dev/inotify two thousand times is silly. If we can
> +implement user-space's preferences cleanly--and we can, the idr
> +layer makes stuff like this trivial--then we should.
> +

To address the particular points:
- "An fd-per-watch quickly consumes more file descriptors than
are allowed"

I note that the current limit if 'wd's is 8192 per user, compared
with the default 'fd' limit of 1024 per process. So if a user has
more than 8 processes making very heavy use of inotify, they would
be better off with the file-descriptor limit...
So I don't find this argument convincing.

I would suggest that it be removed, and you put in a separate
section discussion resource usage and limits. You could explain why
you don't use rlimit (probably not a hard case to win) and why the
sysadmin cannot give different limits to different users (as a
sys-admin, I would fight that), and why no one user would ever want
more than about 8 processes using inotify :-)

- "more fd's than are feasible to manage,"

It's not clear what this means. The program will still need to
manage lots of 'wd's. How is this different from handling 'fd's?
The only 'manage'ment issue I could come up with is the hassle of
shepherding all these 'wd's through a 'fork', only to close them
before an 'exec'. Is that what you mean? If so it would help to
make it more explicit.

Using the fd from /dev/inotify like a 'bag' of 'wd's which can be
closed as a whole or passed around (through fork or exec or even
over a unix domain socket) is, I think, I strong point. You could
possibly leverage that in selling the current inotify interface.

- "and more fd's than are ideally select()-able"
and "users can use epoll, but requiring [it] is silly and an
extraneous requirement"

Having to use epoll *should* not be a problem.
"Surely" thought I "you an just select on the fd returned by
epoll_create, and then adding inotify to your select() loop should
be just as easy with epoll as with /dev/inotify". But then I
discovered that you cannot select() on the epoll fd :-(
Maybe this should be fixed rather than be used as an excuse.

- "A watch consumes less memory than an open filep"

This is largely valid. Two other possible responses:

1/ Make the filep smaller. Not every 'struct file' needs
read-ahead data (which is BIG). Not every file needs the
fown_struct. I suspect that a small and generic struct file could
be created which was embedded inside a larger less generic struct
file when needed.. If the size if filep is really an issue.

2/ An fd doesn't *have* to correspond to a 'struct file'.
Suppose we allowed the 'watch_fd' systemcall to return an fd which
actually mapped to a struct inotify_watch. Possibly using a few
high bits to differentiate different types of 'fds'.
Most system calls would reject these fds immediately. Those we
wanted to handle them could easily be fixed.
So sys_close would know how to close them. sys_fcntl would need to be
able to set close-on-exec. Maybe a few others like poll.
But from user-space, it would be 'just another fd'.

You wouldn't be able to use these 'fds' in select(), but I think we
can all agree that you probably wouldn't want to.

It is also worth noting that there is a more telling reason to not
use "struct file" than the size. A 'struct file' points to a
dentry. A 'struct inotify_watch' points to an inode. This
difference is fairly crucial to how inotify works.

I hope some of these reflections are useful.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at