Re: ioevent queues (was Re: Proposed new poll2() syscall)

Erik Corry (erik@arbat.com)
Sat, 23 Aug 1997 21:24:59 +0200 (MET DST)


> Warning, this is long, but I think worth it. If you've heard of NT's
> completion ports that's where I'm heading.

You can get what you want from a combination of Queued
Signals, Posix message queues and Asynchronous IO, all
defined in Posix.1b. (previously named Posix.4) Markus Kuhn
keeps track of how Posix.1b support is coming along in
Linux, there's a summary available from his home page on
<http://wwwcip.informatik.uni-erlangen.de/~mskuhn/>. The
best reference I know on Posix.1b is Bill O. Gallmeister's
O'Reilly book. The Open Group must also have something
on it at <http://www.rdg.opengroup.org/unix/>.

Basically, the Posix AIO delivers a signal on AIO
completion. You can attach one lump of user-supplied data
(like a void*) to a signal (yow!) and in the handler you
should be able to write your special data to a message queue
(am I right, here? what if the message queue is full and
blocks?). Your main program loop just reads messages out of
the queue.

The only thing that seems to be missing is the
message-on-sigchild thing. Posix.1b doesn't seem to prohibit
having a new-style signal handler for old signals like
SIGCHLD, and in this case the user-supplied void* would be
unused, so we could put the pid of the dead child there.
Then you could write a message saying what child died.
Actually, it seems like you could do that right now: you set
up a child-died-pipe, and have the signal handler write a
byte to it when SIGCHLD arrives. Is there a good reason why
this doesn't allow you include child deaths in your select
events?

The next question is how to implement asynchronous IO
efficiently. In practice you need a thread for each
outstanding IO because the kernel is written under the
assumption that there is a kernel stack for each IO going
on. David Miller and Mike Jagdis are looking at doing some
clever trickery on AIO on raw devices (which is what
database people use AIO for) so that only a kernel thread is
necessary, that has no user-space stack, etc.

Presumably you would want to implement the server so that
you can have two processes picking up data from the message
queue on a dual processor. Or picking up data from two
message queues.

As a simpler solution to the Apache dilemma, what about an
fcntl(fd, WAKE_UP_ONE), which means only one process is
woken from select? Apparently, FreeBSD has this, though it
may be implicit on fds returned from socket(), which would
violate the required semantics of select.

-- 
Erik Corry erik@arbat.com http://inet.uni-c.dk/~ehcorry/