Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

From: Christian Brauner
Date: Mon May 06 2024 - 05:27:14 EST


On Mon, May 06, 2024 at 10:45:35AM +0200, Christian Brauner wrote:
> > The fact is, it's not dma-buf that is violating any rules. It's epoll.
>
> I agree that epoll() not taking a reference on the file is at least
> unexpected and contradicts the usual code patterns for the sake of
> performance and that it very likely is the case that most callers of
> f_op->poll() don't know this.
>
> Note, I cleary wrote upthread that I'm ok to do it like you suggested
> but raised two concerns a) there's currently only one instance of
> prolonged @file lifetime in f_op->poll() afaict and b) that there's
> possibly going to be some performance impact on epoll().
>
> So it's at least worth discussing what's more important because epoll()
> is very widely used and it's not that we haven't favored performance
> before.
>
> But you've already said that you aren't concerned with performance on
> epoll() upthread. So afaict then there's really not a lot more to
> discuss other than take the patch and see whether we get any complaints.

Two closing thoughts:

(1) I wonder if this won't cause userspace regressions for the semantics
of epoll because dying files are now silently ignored whereas before
they'd generated events.

(2) The other part is that this seems to me that epoll() will now
temporarly pin filesystems opening up the possibility for spurious
EBUSY errors.

If you register a file descriptor in an epoll instance and then
close it and umount the filesystem but epoll managed to do an fget()
on that fd before that close() call then epoll will pin that
filesystem.

If the f_op->poll() method does something that can take a while
(blocks on a shared mutex of that subsystem) that umount is very
likely going to return EBUSY suddenly.

Afaict, before that this wouldn't have been an issue at all and is
likely more serious than performance.

(One option would be to only do epi_fget() for stuff like
dma-buf that's never unmounted. That'll cover nearly every
driver out there. Only "real" filesystems would have to contend with
@file count going to zero but honestly they also deal with dentry
lookup under RCU which is way more adventurous than this.)

Maybe I'm barking up the wrong tree though.