Re: Strange issues with epoll since 5.0

From: Davidlohr Bueso
Date: Wed Apr 24 2019 - 17:52:59 EST


On Wed, 24 Apr 2019, Davidlohr Bueso wrote:

On Wed, 24 Apr 2019, Eric Wong wrote:

Omar Kilani <omar.kilani@xxxxxxxxx> wrote:
Hi there,

I???m still trying to piece together a reproducible test that triggers
this, but I wanted to post in case someone goes ???hmmm... change X
might have done this???.

Maybe Davidlohr knows, since he's responsible for most of the
epoll changes in 5.0.

Not really, I have not been made aware of any issues until now.


Basically, something???s broken (or at least, has changed enough to
cause problems in user space) in epoll since 5.0. It???s still broken in
5.1-rc5.

It doesn???t happen 100% of the time. It???s sort of hard to pin down but
I???ve observed the following:

* nginx not accepting connections under load
* A java app which uses netty / NIO having strange writability
semantics on channels, which confuses netty / java enough to not
properly flush written data on the socket.

Off the top of my head, could the following be responsible?

c5a282e9635 (fs/epoll: reduce the scope of wq lock in epoll_wait())