Re: and nicer too - Re: [PATCH] epoll more scalable than poll

From: Davide Libenzi (davidel@xmailserver.org)
Date: Tue Oct 29 2002 - 16:25:24 EST

Next message: Christoph Hellwig: "Re: [BK updates] fbdev changes updates."
Previous message: James Simmons: "Re: [BK updates] fbdev changes updates."
In reply to: bert hubert: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Next in thread: Hanna Linder: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Reply: Hanna Linder: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 29 Oct 2002, bert hubert wrote:

> On Mon, Oct 28, 2002 at 04:57:12PM -0800, Davide Libenzi wrote:
>
> > The epoll interface has to be used with non-blocking fds. The EAGAIN
> > return code from read/write tells you that you can go safely to wait for
> > events for that fd because you making the read/write to return EAGAIN, you
> > consumed the whole I/O space for that fd. Consuming the whole I/O space
> > meant that you brought the signal to zero ( talking in ee terms ), and a
> > followinf 0->1 transaction will trigger the event. Where 1 means I/O space
> > available ...
>
> I tried to modify the mtasker webserver
> (http://ds9a.nl/mtasker/releases/mtasker-0.2.tar.gz) to work with epoll
> instead of select and, well, this is awkward.
>
> The right way to use epoll is (schematically);
>
> int clientfd=accept();
> setnonblocking(clientfd);
>
> epfd=epoll_create();
> epoll_ctl(epfd,EP_CTL_ADD, clientfd, POLLIN);
>
> for(;;) {
> if(read(clientfd)<0 && errno==EAGAIN) {
> epoll_wait(epfd);
> continue;
> }
> epoll_ctl(epfd,EP_CTL_DEL, clientfd); // remove again
> break;
> }
>
> This requires having the fd in epoll before trying to read, which means a
> weird interface where you have to register your interest, try to read, and
> unregister your interface in case it succeeded.
>
> This means two epoll syscalls per read.

Bert, you don't need to do that. Like I wrote in the latest man pages, you
can easily add the fd inside the interest set with POLLIN | POLLOUT when
the fd born ( accept/connect ) and leave it there for its whole life. The
fact that the API is edge triggered garanties you that you won't receive
any events when the fd is not used. I personallt tried to measure the
number of stale events and is basically ( better definitely ) uninfluent.

> This is all solved if epoll_ctl() creates an edge if it finds that the poll
> condition is met at insert time.
>
> The way it works now is way more awkward then using regular poll(), the way
> it works now is very easy to do wrong because of this awkwardness. Even
> semi-correct which zeroes the pollstate before calling epoll is wrong:
>
> for(;;) {
> if(read(clientfd)<0 && errno==EAGAIN) {
> waitOn(clientfd); /* function which registers our
> interest with a central
> dispatcher and waits */
> continue;
> }
> break;
> }
>
> Code like this would appear in many userspace threading abstractions, like
> GNU Pth or mtasker. Instead, we need:
>
> for(;;) {
> registerReadInterest(clientfd);
> if(read(clientfd)<0 && errno==EAGAIN) {
> waitOn(clientfd); /* function which waits for our
> interest be satisfied */
> continue;
> }
> deleteReadInterest(clientfd);
> break;
> }
>
> If epoll_ctl make epoll_wait report an edge in case it finds that there is
> already data, all this is way way simpler, allowing:
>
> waitOn(clientfd); /* function which registers interest and waits for
> an edge to appear */
> read(clientfd);

You can do all this easily, expecially with API that has a core dispatch
loop like GNU PTH. The example http server used to test it is an example
on how to use it ( with coroutines ).

Hanna, is it possible for you guys to cleanup ephttpd and make it an
example program for sys_epoll ?

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Hellwig: "Re: [BK updates] fbdev changes updates."
Previous message: James Simmons: "Re: [BK updates] fbdev changes updates."
In reply to: bert hubert: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Next in thread: Hanna Linder: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Reply: Hanna Linder: "Re: and nicer too - Re: [PATCH] epoll more scalable than poll"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:44 EST