On Fri, 23 Sep 2005, Davide Libenzi wrote:
3. Wakeup
As determined by testing with userland code, the sys_tgkill() and
sys_tkill() functions currently will NOT wake up a sleeping
epoll_wait(). Effectively, this means that epoll_wait() is NOT a pthread
cancellation point. There are two potential issues with this:
- epoll_wait() meets the unofficial(?) definition of a "system call that
may block".
- epoll_wait() behaves differently from poll() and friends.
The epoll_wait() wait loop is the standard one that even poll() uses (prep
wait, make interruptible, test signals, sched timeo). So if poll() is woke
up, so should epoll_wait(). A minimal code snippet that proves poll()
behing woke up, and epoll_wait() not, would help.
Certainly. :-) See end of email for sample program.
I'm afraid you need to bug the glibc guys, since I think they wrap
sys_poll(). Try the test program below, when defining _X_, that makes it
call sys_poll() directly. It will have the same epoll_wait() behaviour.
I'm still a bit confused by how the pthread implementation fits
together. Correct me if the following is wrong, please:
Whenever the user wants to cancel a pthread, glibc eventually calls
{sys-}tgkill() upon the given thread, causing the kernel to return EINTR
to the blocking system call, in this case epoll_wait(). It is glibc's
job to catch this return value and realize that the thread is ready to be
killed, which it is not doing in the case of epoll_wait().
Or is the "current thread has been cancelled and should be killed" check
happening elsewhere / in some other way?
By the way, I already brought this up on the glibc mailing list (before
I sent it to LKML), and it seems they couldn't care less.
(http://sources.redhat.com/ml/libc-alpha/2005-09/msg00071.html)