Re: [PATCH] poll: prevent missed events if _qproc is NULL

From: Eric Wong
Date: Tue Jan 01 2013 - 16:00:26 EST


Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> On Mon, 2012-12-31 at 13:21 +0000, Eric Wong wrote:
> > This patch seems to fix my issue with ppoll() being stuck on my
> > SMP machine: http://article.gmane.org/gmane.linux.file-systems/70414
> >
> > The change to sock_poll_wait() in
> > commit 626cf236608505d376e4799adb4f7eb00a8594af
> > (poll: add poll_requested_events() and poll_does_not_wait() functions)
> > seems to have allowed additional cases where the SMP memory barrier
> > is not issued before checking for readiness.
> >
> > In my case, this affects the select()-family of functions
> > which register descriptors once and set _qproc to NULL before
> > checking events again (after poll_schedule_timeout() returns).
> > The set_mb() barrier in poll_schedule_timeout() appears to be
> > insufficient on my SMP x86-64 machine (as it's only an xchg()).
> >
> > This may also be related to the epoll issue described by
> > Andreas Voellmy in http://thread.gmane.org/gmane.linux.kernel/1408782/
>
> Hmm, the change seems not very logical to me.

My original description was not complete and I'm still bisecting
my problem (ppoll + send stuck). However, my patch does solve the
issue Andreas encountered and I now understand why.

> If it helps, I would like to understand the real issue.
>
> commit 626cf236608505d376e4799adb4f7eb00a8594af should not have this
> side effect, at least for poll()/select() functions. The epoll() changes
> I am not yet very confident.

I have a better explanation of the epoll problem below.

An alternate version (limited to epoll) would be:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cd96649..ca5f3d0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1299,6 +1299,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/
+ smp_mb();
revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt);

/*

> I suspect a race already existed before this commit, it would be nice to
> track it properly.

I don't believe this race existed before that change.

Updated commit message below: