Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

From: Hillf Danton
Date: Sat May 04 2024 - 05:26:37 EST


On Fri, 3 May 2024 22:24:28 +0100 Al Viro wrote:
> On Fri, May 03, 2024 at 02:11:30PM -0700, Linus Torvalds wrote:
> > epoll is a mess, and does various invalid things in the name of
> > performance.
> >
> > Let's try to rein it in a bit. Something like this, perhaps?
>
> > +/*
> > + * The ffd.file pointer may be in the process of
> > + * being torn down due to being closed, but we
> > + * may not have finished eventpoll_release() yet.
> > + *
> > + * Technically, even with the atomic_long_inc_not_zero,
> > + * the file may have been free'd and then gotten
> > + * re-allocated to something else (since files are
> > + * not RCU-delayed, they are SLAB_TYPESAFE_BY_RCU).
>
> Can we get to ep_item_poll(epi, ...) after eventpoll_release_file()
> got past __ep_remove()? Because if we can, we have a worse problem -

Nope but mtx can help poll go before remove, see below.

> epi freed under us.
>
> If not, we couldn't possibly have reached ->release() yet, let
> alone freeing anything.

Actually poll can see a file with zero f_count, and LT's idea with
trival change survived the syzbot repro [1].

I think fput currently can race with epoll wrt f_count, and checking
it in dma-buf is necessary if his idea looks too aggressive.

wait_epoll() __fput()
do_epoll_wait() eventpoll_release_file()
ep_poll()
ep_send_events()
mutex_lock(&ep->mtx)
ep_item_poll()
vfs_poll()
mutex_unlock(&ep->mtx)
mutex_lock(&ep->mtx)
dispose = __ep_remove(ep, epi, true)
mutex_unlock(&ep->mtx)

[1] https://lore.kernel.org/lkml/000000000000f1c99d061798ac6d@xxxxxxxxxx/