Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

From: Christian Brauner
Date: Sat May 04 2024 - 05:37:47 EST


On Fri, May 03, 2024 at 02:33:37PM -0700, Linus Torvalds wrote:
> On Fri, 3 May 2024 at 14:24, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Can we get to ep_item_poll(epi, ...) after eventpoll_release_file()
> > got past __ep_remove()? Because if we can, we have a worse problem -
> > epi freed under us.
>
> Look at the hack in __ep_remove(): if it is concurrent with
> eventpoll_release_file(), it will hit this code
>
> spin_lock(&file->f_lock);
> if (epi->dying && !force) {
> spin_unlock(&file->f_lock);
> return false;
> }
>
> and not free the epi.
>
> But as far as I can tell, almost nothing else cares about the f_lock
> and dying logic.
>
> And in fact, I don't think doing
>
> spin_lock(&file->f_lock);
>
> is even valid in the places that look up file through "epi->ffd.file",
> because the lock itself is inside the thing that you can't trust until
> you've taken the lock...
>
> So I agree with Kees about the use of "atomic_dec_not_zero()" kind of
> logic - but it also needs to be in an RCU-readlocked region, I think.

Why isn't it enough to just force dma_buf_poll() to use
get_file_active()? Then that whole problem goes away afaict.

So the fix I had yesterday before I had to step away from the computer
was literally just that [1]. It currently uses two atomic incs
potentially but that can probably be fixed by the dma folks to be
smarter about when they actually need to take a file reference.

>
> I wish epoll() just took the damn file ref itself. But since it relies
> on the file refcount to release the data structure, that obviously
> can't work.
>
> Linus

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 8fe5aa67b167..7149c45976e1 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (!dmabuf || !dmabuf->resv)
return EPOLLERR;

+ if (!get_file_active(&dmabuf->file))
+ return EPOLLERR;
+
resv = dmabuf->resv;

poll_wait(file, &dmabuf->poll, poll);

events = poll_requested_events(poll) & (EPOLLIN | EPOLLOUT);
- if (!events)
+ if (!events) {
+ fput(file);
return 0;
+ }

dma_resv_lock(resv, NULL);

@@ -268,7 +273,6 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLOUT) {
/* Paired with fput in dma_buf_poll_cb */
get_file(dmabuf->file);
-
if (!dma_buf_poll_add_cb(resv, true, dcb))
/* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
@@ -301,6 +305,7 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
}

dma_resv_unlock(resv);
+ fput(file);
return events;
}