Re: [PATCH liburing v2 0/1] test: add epoll test case

From: Stefano Garzarella
Date: Fri Feb 07 2020 - 11:51:44 EST


On Fri, Jan 31, 2020 at 08:39:46AM -0700, Jens Axboe wrote:
> On 1/31/20 7:29 AM, Stefano Garzarella wrote:
> > Hi Jens,
> > this is a v2 of the epoll test.
> >
> > v1 -> v2:
> > - if IORING_FEAT_NODROP is not available, avoid to overflow the CQ
> > - add 2 new tests to test epoll with IORING_FEAT_NODROP
> > - cleanups
> >
> > There are 4 sub-tests:
> > 1. test_epoll
> > 2. test_epoll_sqpoll
> > 3. test_epoll_nodrop
> > 4. test_epoll_sqpoll_nodrop
> >
> > In the first 2 tests, I try to avoid to queue more requests than we have room
> > for in the CQ ring. These work fine, I have no faults.
>
> Thanks!
>
> > In the tests 3 and 4, if IORING_FEAT_NODROP is supported, I try to submit as
> > much as I can until I get a -EBUSY, but they often fail in this way:
> > the submitter manages to submit everything, the receiver receives all the
> > submitted bytes, but the cleaner loses completion events (I also tried to put a
> > timeout to epoll_wait() in the cleaner to be sure that it is not related to the
> > patch that I send some weeks ago, but the situation doesn't change, it's like
> > there is still overflow in the CQ).
> >
> > Next week I'll try to investigate better which is the problem.
>
> Does it change if you have an io_uring_enter() with GETEVENTS set? I wonder if
> you just pruned the CQ ring but didn't flush the internal side.
>

Just an update: after the "io_uring: flush overflowed CQ events in the
io_uring_poll()" the test 3 works well.

Now the problem is the test 4 (with sqpoll). It works in most cases, but it
fails a few times in this way:
- the submitter freezes after submitting X requests
- the cleaner and the consumer see X-2 requests (2 are the entries in
the queue)

I tried to put a timeout on the submitter's epoll and do an io_uring_submit()
to wake up the kthread (if we lose some notifications), but the problem seems
to be somewhere else. I think a race somewhere.

Any suggestion on how to debug this case?
I'll try with tracing.

Thanks,
Stefano