Re: and nicer too - Re: [PATCH] epoll more scalable than poll

From: bert hubert (ahu@ds9a.nl)
Date: Tue Oct 29 2002 - 08:09:43 EST

Next message: Duncan Sands: "Race in net/socket.c?"
Previous message: Martin Waitz: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
In reply to: Davide Libenzi: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Next in thread: Davide Libenzi: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Reply: Davide Libenzi: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Oct 28, 2002 at 04:57:12PM -0800, Davide Libenzi wrote:

> The epoll interface has to be used with non-blocking fds. The EAGAIN
> return code from read/write tells you that you can go safely to wait for
> events for that fd because you making the read/write to return EAGAIN, you
> consumed the whole I/O space for that fd. Consuming the whole I/O space
> meant that you brought the signal to zero ( talking in ee terms ), and a
> followinf 0->1 transaction will trigger the event. Where 1 means I/O space
> available ...

I tried to modify the mtasker webserver
(http://ds9a.nl/mtasker/releases/mtasker-0.2.tar.gz) to work with epoll
instead of select and, well, this is awkward.

The right way to use epoll is (schematically);

int clientfd=accept();
setnonblocking(clientfd);

epfd=epoll_create();
epoll_ctl(epfd,EP_CTL_ADD, clientfd, POLLIN);

        for(;;) {
                if(read(clientfd)<0 && errno==EAGAIN) {
                        epoll_wait(epfd);
                        continue;
                }
                epoll_ctl(epfd,EP_CTL_DEL, clientfd); // remove again
                break;
        }

This requires having the fd in epoll before trying to read, which means a
weird interface where you have to register your interest, try to read, and
unregister your interface in case it succeeded.

This means two epoll syscalls per read.

This is all solved if epoll_ctl() creates an edge if it finds that the poll
condition is met at insert time.

The way it works now is way more awkward then using regular poll(), the way
it works now is very easy to do wrong because of this awkwardness. Even
semi-correct which zeroes the pollstate before calling epoll is wrong:

        for(;;) {
                if(read(clientfd)<0 && errno==EAGAIN) {
                        waitOn(clientfd); /* function which registers our
                                              interest with a central
                                              dispatcher and waits */
                        continue;
                }
                break;
        }

Code like this would appear in many userspace threading abstractions, like
GNU Pth or mtasker. Instead, we need:

        for(;;) {
                registerReadInterest(clientfd);
                if(read(clientfd)<0 && errno==EAGAIN) {
                        waitOn(clientfd); /* function which waits for our
                                              interest be satisfied */
                        continue;
                }
                deleteReadInterest(clientfd);
                break;
        }

If epoll_ctl make epoll_wait report an edge in case it finds that there is
already data, all this is way way simpler, allowing:

        waitOn(clientfd); /* function which registers interest and waits for
                             an edge to appear */
        read(clientfd);

Regards,

bert

-- 
http://www.PowerDNS.com          Versatile DNS Software & Services
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Duncan Sands: "Race in net/socket.c?"
Previous message: Martin Waitz: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
In reply to: Davide Libenzi: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Next in thread: Davide Libenzi: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Reply: Davide Libenzi: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:41 EST