Re: [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file

From: Tejun Heo
Date: Mon Jul 10 2023 - 16:38:55 EST


Hello,

On Fri, Jun 30, 2023 at 10:21:17AM +0200, Christian Brauner wrote:
> What I'm mostly reacting to is that there's a kernfs_ops->release()
> method which mirrors f_op->release() but can be called when there are
> still users which is counterintuitive for release semantics. And that
> ultimately caused this UAF issue which was rather subtle given how long
> it took to track down the root cause.
>
> A rmdir() isn't triggering a f_op->release() if there are still file
> references but it's apparently triggering a kernfs_ops->release(). It
> feels like this should at least be documented in struct kernfs_ops...

Oh yeah, better documentation would be great. The core part here is that
kernfs is the layer which is implementing the revoke-like semantics
specifically to allow kernfs users (the ones that implement kernfs_ops) can
synchronously abort their involvement at will. So, from those users' POV,
->release is being called when it should be. The problem here was that PSI
was mixing objects from two layers with different lifetime rules, which
obviously causes issues.

As Suren's new fix shows, the fix is just using the matching object whose
lifetime is governed by kernfs. While this shows up in a subtle way for
poll, for all other operations, this is almost completely transprent.

Thanks.

--
tejun