Re: [PATCH] media: v4l2-mem2mem: always call poll_wait() on queues

From: Alexandre Courbot
Date: Tue Nov 03 2020 - 03:51:40 EST


Hi Hans,

On Sat, Oct 31, 2020 at 12:09 AM Hans Verkuil <hverkuil-cisco@xxxxxxxxx> wrote:
>
> On 22/10/2020 14:24, Alexandre Courbot wrote:
> > do_poll()/do_select() seem to set the _qproc member of poll_table to
> > NULL the first time they are called on a given table, making subsequent
> > calls of poll_wait() on that table no-ops. This is a problem for mem2mem
> > which calls poll_wait() on the V4L2 queues' waitqueues only when a
> > queue-related event is requested, which may not necessarily be the case
> > during the first poll.
> >
> > For instance, a stateful decoder is typically only interested in
> > EPOLLPRI events when it starts, and will switch to listening to both
> > EPOLLPRI and EPOLLIN after receiving the initial resolution change event
> > and configuring the CAPTURE queue. However by the time that switch
> > happens and v4l2_m2m_poll_for_data() is called for the first time,
> > poll_wait() has become a no-op and the V4L2 queues waitqueues thus
> > cannot be registered.
> >
> > Fix this by moving the registration to v4l2_m2m_poll() and do it whether
> > or not one of the queue-related events are requested.
>
> This looks good, but would it be possible to add a test for this to
> v4l2-compliance? (Look for POLL_MODE_EPOLL in v4l2-test-buffers.cpp)
>
> If I understand this right, calling EPOLL_CTL_ADD for EPOLLPRI, then
> calling EPOLL_CTL_ADD for EPOLLIN/OUT would trigger this? Or does there
> have to be an epoll_wait call in between?

Even without an epoll_wait() in between the behavior is visible.
v4l2_m2m_poll() will be called once during the initial EPOLL_CTL_ADD
and this will trigger the bug.

> Another reason for adding this test is that I wonder if regular capture
> or output V4L2 devices don't have the same issue.
>
> It's a very subtle bug and so adding a test for this to v4l2-compliance
> would be very useful.

I fully agree, this is very counter-intuitive since what basically
happens is that the kernel's poll_wait() function becomes a no-op
after the poll() hook of a driver is called for the first time. There
is no way one can expect this behavior just from browsing the code so
this is likely to affect other drivers.

As for the test itself, we can easily reproduce the conditions for
failure in v4l2-test-buffers.cpp's captureBufs() function, but doing
so will make the streaming tests fail without being specific about the
cause. Or maybe we should add another pollmode to specifically test
epoll in this setup? Can I get your thoughts?

Cheers,
Alex.