Re: [take24 3/6] kevent: poll/select() notifications.

From: Davide Libenzi
Date: Thu Nov 09 2006 - 15:11:05 EST


On Thu, 9 Nov 2006, Davide Libenzi wrote:

> On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:
>
> > On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi (davidel@xxxxxxxxxxxxxxx) wrote:
> > > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:
> > >
> > > > +static int kevent_poll_callback(struct kevent *k)
> > > > +{
> > > > + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) {
> > > > + return 1;
> > > > + } else {
> > > > + struct file *file = k->st->origin;
> > > > + unsigned int revents = file->f_op->poll(file, NULL);
> > > > +
> > > > + k->event.ret_data[0] = revents & k->event.event;
> > > > +
> > > > + return (revents & k->event.event);
> > > > + }
> > > > +}
> > >
> > > You need to be careful that file->f_op->poll is not called inside the
> > > spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up
> > > during epoll developemtn days) file->f_op->poll might do a simple
> > > spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to
> > > have a suboptimal double O(R) loop to handle LT events.
> >
> > It is tricky - users call wake_up() from any context, which in turn ends
> > up calling kevent_storage_ready(), which calls kevent_poll_callback() with
> > KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast
> > path. Since callback returns 1, kevent will be queued into ready queue,
> > which is processed on behalf of syscalls - in that case kevent will
> > check the flag and since KEVENT_REQ_LAST_CHECK is set, will call
> > callback again to check if kevent is correctly marked, but already
> > without that flag (it happens in syscall context, i.e. process context
> > without any locks held), so callback calls ->poll(), which can sleep,
> > but it is safe. If ->poll() returns 'ready' value, kevent is transfers
> > data into userspace, otherwise it is 'requeued' (just removed from
> > ready queue).
>
> Oh, mine was only a general warn. I hadn't looked at the generic code
> before. But now that I poke on it, I see:
>
> void kevent_requeue(struct kevent *k)
> {
> unsigned long flags;
>
> spin_lock_irqsave(&k->st->lock, flags);
> __kevent_requeue(k, 0);
> spin_unlock_irqrestore(&k->st->lock, flags);
> }
>
> and then:
>
> static int __kevent_requeue(struct kevent *k, u32 event)
> {
> int ret, rem;
> unsigned long flags;
>
> ret = k->callbacks.callback(k);
>
> Isn't the k->callbacks.callback() possibly end up calling f_op->poll?

Ack, there the check for KEVENT_REQ_LAST_CHECK inside the callback.
The problem with f_op->poll was not that it can sleep (not excluded
though) but that some f_op->poll can do a simple spin_lock_irq/spin_unlock_irq.
But for a quick peek your new code seems fine with that.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/