Re: Unifying epoll,aio,futexes etc. (What I really want from epoll)

From: Jamie Lokier (lk@tantalophile.demon.co.uk)
Date: Fri Nov 01 2002 - 15:45:42 EST


Davide Libenzi wrote:
> > In other words, add another op to sys_futex() called FUTEX_EPOLL which
> > directly registers the futex on an epoll interest list, and let epoll
> > report those events as futex events.
>
> Jamie, the futex support can be easily done with one line of code patch. I
> still prefer the one-to-one mapping between futexes and files.

I forgot something important: futex notifcations must be _exactly
counted_ for some uses of futexes. It's all very subtle, but there's
an example in Rusty's futex library where a token is passed to one of
the waiters, and waiters are queued up behind each other in the order
they started waiting. (See futex_up_fair() in usersem.h). You need
this to prevent starvation, with Alan's example of waiting for
multiple futexes being a particularly nasty case.

Because of this, and the way your one-liner works, I think* that a
multi-threaded program will need to allocate one fd per waiter to
guarantee the counter - not one fd per waited-upon futex. So when
1000 threads are waiting on some global mutex (as happens), they'll
need an fd each - they can't share one.

  * - If I'm wrong about this, please someone correct me.

Consequently, fds will need to be allocated when a thread wants to
wait, instead of lazily once per contended futex - hence a higher rate
of allocations and deallocations.

The fixes for this are twofold:

    1. You must change file_send_notify() so that it takes a count which
       limits the number of notifications (like FUTEX_WAKE), and returns
       the number of notifications sent.

    2. The futex's queue of waiters must contain the epoll waiters _and_
       waitqueue waiters, in the order that they started waiting. It's
       not enough to wake the epoll waiters first, and if any
       notifications are left, wake the others, nor vice versa.

Futex epolls are a bit fiddly.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 07 2002 - 22:00:21 EST