Re: [patch] inotify.

From: Neil Brown
Date: Mon Jun 20 2005 - 22:36:14 EST

On Monday June 20, ttb@xxxxxxxxxxxxxxxx wrote:
> On Tue, 2005-06-21 at 10:51 +1000, Neil Brown wrote:
> > I haven't yet decided if I like it or not, but maybe I have some
> > useful comments to make...
> >
> > > +
> > > +Q: What is the design decision behind using an-fd-per-device as
> > > +opposed to an fd-per-watch?
> > > +
> > > +A: An fd-per-watch quickly consumes more file descriptors than
> > > +are allowed, more fd's than are feasible to manage, and more
> > > +fd's than are ideally select()-able. Yes, root can bump the
> > > +per-process fd limit and yes, users can use epoll, but requiring
> > > +both is silly and an extraneous requirement. A watch consumes
> > > +less memory than an open file, separating the number spaces is
> > > +thus sensible. The current design is what user-space developers
> > > +want: Users open the device, once, and add n watches, requiring
> > > +but one fd and no twiddling with fd limits.
> > > +Opening /dev/inotify two thousand times is silly. If we can
> > > +implement user-space's preferences cleanly--and we can, the idr
> > > +layer makes stuff like this trivial--then we should.
> > > +
> >
> >
> > To address the particular points:
> > - "An fd-per-watch quickly consumes more file descriptors than
> > are allowed"
> >
> > I note that the current limit if 'wd's is 8192 per user, compared
> > with the default 'fd' limit of 1024 per process. So if a user has
> > more than 8 processes making very heavy use of inotify, they would
> > be better off with the file-descriptor limit...
> > So I don't find this argument convincing.
> >
> > I would suggest that it be removed, and you put in a separate
> > section discussion resource usage and limits. You could explain why
> > you don't use rlimit (probably not a hard case to win) and why the
> > sysadmin cannot give different limits to different users (as a
> > sys-admin, I would fight that), and why no one user would ever want
> > more than about 8 processes using inotify :-)
> >
> Looking at beagle, it is a single process that can easily watch more
> than 1024 directories. Beagle would have to work around the 1024 limit
> by managing processes, each of which watch a fraction of the total
> watches. PITA. Also, 8192 was an arbitrary choice that seemed big enough
> for most users needs and all of these limits can be set in sysfs.
> Checkout /sys/misc/inotify.

I think you missed my point...
The FAQ entry which I found unconvincing said

"Yes, root can bump the per-process fd limit ... but requiring [this]
is silly and an extraneous requirement."

So when you say "... all of these limits can be set in sysfs", you are
effectively contradicting the FAQ entry.
Yes, 1024 is an arbitrary limit. But so is 8192.
The argument "We cannot use fd's because the arbitrary limit is lower
than the arbitrary limit that I want to use" simply doesn't hold
There may well be other good arguments against 'fd's, but I'm trying
to point out that this isn't one of them, and so shouldn't appear in
this part of the FAQ.

> > - "more fd's than are feasible to manage,"
> >
> > It's not clear what this means. The program will still need to
> > manage lots of 'wd's. How is this different from handling 'fd's?
> > The only 'manage'ment issue I could come up with is the hassle of
> > shepherding all these 'wd's through a 'fork', only to close them
> > before an 'exec'. Is that what you mean? If so it would help to
> > make it more explicit.
> >
> The program doesn't really have to manage the wd's. A program simply
> needs a map from wd to it's own structure.

So I think you are agreeing with me that the text "more fd's than are
feasible to manage," should be removed from the FAQ? Thanks.

> Assume for a moment that we had chosen a fd-as-watch approach. Now take
> beagle, which has tons of watches. Every time beagle goes through it's
> loop, it has to add each of those watch-fd's to the select table, then
> check for each one afterwards. It's pretty easy to see how this is
> unmanageable and a waste of CPU time.

I don't think there can be any question that having to tell select
about each 'watch' is unreasonable.

> Why bother though? The idr layer lets us have a integer that maps to a
> data structure in kernel space, pretty much for free.
Because some people think that creating a whole new abstraction when
we have one (fds) that seem to fit the bill, is the wrong thing to do.
I don't currently have an opinion on that, but I am trying to explore
options with a bit of lateral thinking.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at