Re: ioevent queues (was Re: Proposed new poll2() syscall)

Dean Gaudet (dgaudet-list-linux-kernel@arctic.org)
Sat, 23 Aug 1997 12:08:15 -0700 (PDT)


On Sat, 23 Aug 1997, Erik Corry wrote:

> > Warning, this is long, but I think worth it. If you've heard of NT's
> > completion ports that's where I'm heading.
>
> Basically, the Posix AIO delivers a signal on AIO
> completion. You can attach one lump of user-supplied data
> (like a void*) to a signal (yow!) and in the handler you
> should be able to write your special data to a message queue
> (am I right, here? what if the message queue is full and
> blocks?). Your main program loop just reads messages out of
> the queue.

As long as you can handle these signals with a single thread this could
work. But see the next point.

> Is there a good reason why
> this doesn't allow you include child deaths in your select
> events?

Yeah I know of the "self pipe" trick, I haven't benchmarked it yet. But I
suspect that the need to start taking EINTRs all over the place might
hurt. Especially since you'll take an EINTR while in select() because of
the SIGCHLD, then restart the select(), then find out it's a dead child,
and read(). So it costs a signal, a write(), a select() and a read().

The select() in this case times out after one second anyhow because other
work needs to be done. So it's just as easy to wait(WNOHANG) loop to pick
up dead children, then select(). (And far more portable, since apache 1.3
isn't where I intend to look into the really cool non-portable stuff.)

> database people use AIO for) so that only a kernel thread is
> necessary, that has no user-space stack, etc.

Yeah I knew this was a concern ... and I was thinking it might be solvable
by having io threads, threads whose context is only used to complete io
events.

> Presumably you would want to implement the server so that
> you can have two processes picking up data from the message
> queue on a dual processor.

That's the whole point of it.

> As a simpler solution to the Apache dilemma, what about an
> fcntl(fd, WAKE_UP_ONE), which means only one process is
> woken from select? Apparently, FreeBSD has this, though it
> may be implicit on fds returned from socket(), which would
> violate the required semantics of select.

FreeBSD added WAKE_UP_ONE as the default for accept(), I doubt they
changed select() at all. Single socket servers are far easier to deal
with -- and Apache does have a single socket special case (well, my
development copy does). There are issues regarding the implementation of
this ... right now sockets only have one waiting queue, and everyone is
awakened when "something happens". To implement it you would probably
have to put two queues on -- one only for accept, and one for the rest.
Then when a new connection is ready wake up a single accept waiter and
wake up everything on the old list. I think that preserves semantics
sufficiently.

Dean