Re: Linux's implementation of poll() not scalable?

From: Linus Torvalds (torvalds@transmeta.com)
Date: Tue Oct 24 2000 - 01:35:54 EST


On Mon, 23 Oct 2000, Dan Kegel wrote:
>
> http://www.FreeBSD.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&manpath=FreeBSD+5.0-current&format=html
> describes the FreeBSD kqueue interface for events:

I've actually read the BSD kevent stuff, and I think it's classic
over-design. It's not easy to see what it's all about, and the whole <kq,
ident, filter> tuple crap is just silly. Looks much too complicated.

I don't believe in the "library" argument at all, and I think multiple
event queues completely detract from the whole point of being simple to
use and implement.

Now, I agree that my bind_event()/get_event() has limitations, and the
biggest one is probably the event "id". It needs to be better, and it
needs to have more structure. The "id" really should be something that not
only contains the "fd", but also contains the actor function to be called,
along with some opaque data for that function.

In fact, if you take my example server, and move the "handle[id]()" call
_into_ get_events() (and make the "handle[id]()" function pointer a part
of the ID of the event), then the library argument goes away too: it
doesn't matter _who_ calls the get_event() function, because the end
result is always going to be the same regardless of whether it is called
from within a library or from a main loop: it's going to call the function
handle associated with the ID that triggered.

Basically, the main loop would boil down to

        for (;;) {
                static struct event ev_list[MAXEV];
                get_event(ev_list, MAXEV, &tmout);
                .. timeout handling here ..
        }

because get_even() would end up doing all the user-mode calls too (so
"get_event()" is no longer a system call: it's a system call + a for-loop
to call all the ID handler functions that were associated with the events
that triggered).

So the "struct event" would just be:

        struct event {
                int fd;
                unsigned long mask;
                void *opaque;
                void (*event_fn)(ind fd, unsigned long mask, void *opaque);
        }

and there's no need for separate event queues, because the separate event
queues have been completely subsumed by the fact that every single event
has a separate event function.

So now you'd start everything off (assuming the same kind of "listen to
everything and react to it" server as in my previous example) by just
setting

        bind_event(sock, POLLIN, NULL, accept_fn);

which basically creates the event inside the kernel, and will pass it to
the "__get_event()" system call through the event array, so the
get_event() library function basically looks like

        int get_event(struct event *array, int maxevents, struct timeval *tv)
        {
                int nr = __get_event(array, maxevents, tv);
                int i;
                for (i = 0; i < nr; i++) {
                        array->event_fn(array->fd, array->mask, array->opaque);
                        array++;
                }
                return nr;
        }

and tell me why you'd want to have multiple event queues any more?

(In fact, you might as well move the event array completely inside
"get_event()", because nobody would be supposed to look at the raw array
any more. So the "get_event()" interface would be even simpler: just a
timeout, nothing more. Talk about simple programming.

(This is also the ideal event programming interface - signals get racy and
hard to handle, while in the above example you can trivially just be
single-threaded. Which doesn't mean that you CANNOT be multi-threaded if
you want to: you multi-thread things by just having multiple threads that
all call "get_event()" on their own).

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:13 EST