Re: epoll design problems with common fork/exec patterns

From: Michael Kerrisk
Date: Thu Feb 28 2008 - 08:13:25 EST


On Wed, Feb 27, 2008 at 8:35 PM, Davide Libenzi <davidel@xxxxxxxxxxxxxxx> wrote:
> On Tue, 26 Feb 2008, Chris "ã~B¯" Heath wrote:
>
> > On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
> > >
>
> > > Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
> > > the key.
> >
> > To clarify, the key appears to be file* plus the user-space integer that
> > represents the fd.
>
> Yes, that's what I said.
>
> > > > c) It is possible to add duplicated file descriptors referring to the same
> > > > underlying open file description ("file *"). As you note, this can be a
> > > > useful filtering technique, if the two file descriptors specify different
> > > > masks.
> > > >
> > > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
> > > > for Q1/A1 as follows:
> > > >
> > > > Q1 What happens if you add the same file descriptor
> > > > to an epoll set twice?
> > > >
> > > > A1 You will probably get EEXIST. However, it is pos-
> > > > sible to add a duplicate (dup(2), dup2(2),
> > > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> > > > epoll set. This can be a useful technique for
> > > > filtering events, if the duplicate file descrip-
> > > > tors are registered with different events masks.
> > > >
> > > > Seem okay Davide?
> > >
> > > Looks sane to me.
> >
> > I think fork(2) should not be in the above list. fork(2) duplicates the
> > kernel's fd, but the user-space integer that represents the fd remains
> > the same, so you will get EEXIST if you try to add the fd that was
> > duplicated by fork.
>
> Good catch, fork(2) should not be there.

Okay -- removed.

But it is an ugly inconsistency. On the one hand, a child process
cannot add the duplicate file descriptor to the epoll set. (In every
other case that I can think of , descriptors duplicated by fork have
similar semantics to descriptors duplicated by dup() and friends.) On
the other hand, the very fact that the child has a duplicate of the
descriptor means that even if the parent closes its descriptor, then
epoll_wait() in the parent will continue to receive notifications for
that descriptor because of the duplicated descriptor in the child.

The choice of [file *, fd] as the key for epoll sets really does seem
unfortunate. Keying on [pid, fd] would have given saner semantics, it
seems to me. Obviously it can't be changed now though.

Cheers,

Michael


--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/