Re: [RFC PATCH] net: Fix one page_pool page leak from skb_frag_unref

From: Mina Almasry
Date: Thu May 02 2024 - 16:09:52 EST


On Mon, Apr 29, 2024 at 8:00 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Fri, 26 Apr 2024 21:24:09 -0700 Mina Almasry wrote:
> > On Fri, Apr 26, 2024 at 4:09 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 25 Apr 2024 12:20:59 -0700 Mina Almasry wrote:
> > > > - if (recycle && napi_pp_get_page(page))
> > > > + if (napi_pp_get_page(page))
> > >
> > > Pretty sure you can't do that. The "recycle" here is a concurrency
> > > guarantee. A guarantee someone is holding a pp ref on that page,
> > > a ref which will not go away while napi_pp_get_page() is executing.
> >
> > I don't mean to argue, but I think the get_page()/put_page() pair we
> > do in the page ref path is susceptible to the same issue. AFAIU it's
> > not safe to get_page() if another CPU can be dropping the last ref,
> > get_page_unless_zero() should be used instead.
>

I uploaded a revert for review, but to reland I perhaps need to
understand a bit more the concern here. AFAICT that diff you're
responding to is safe and it works very well with devmem so it would
be my preferred approach to reland (but there are other options if you
are convinced it's bad). FWIW my thoughts:

> Whoever gave us the pointer to operate on has a reference, so the page
> can't disappear. get_page() is safe.

Agreed.

> The problem with pp is that we
> don't know whether the caller has a pp ref or a page ref. IOW the pp
> ref may not be owned by whoever called us.
>

OK, this is where I'm not sure anymore. The diff you're replying to
attempts to enforce the invariant: "if anyone wants a reference on an
skb_frag, skb_frag_ref will be a pp ref on pp frags
(is_pp_page==true), and page refs on non-pp frags
(is_pp_page==false)".

Additionally the page doesn't transition from pp to non-pp and vice
versa while anyone is holding a pp ref, because
page_pool_set_pp_info() is called right after the page is obtained
from the buddy allocator (before released from the page pool) and
page_pool_clear_pp_info() is called only after all the pp refs are
dropped.

So:

1. We know the caller has a ref (otherwise get_page() wouldn't be safe
in the non-pp case).
2. We know that the page has not transitioned from pp to non-pp or
vice versa since the caller obtained the ref (from code inspection, pp
info is not changed until all the refs are dropped for pp pages).
3. AFAICT, it follows that if the page is pp, then the caller has a pp
ref, and if the page is non-pp, then the caller has a page ref.
4. So, if is_pp_page==true, then the caller has a pp ref, then
napi_pp_get_page() should be concurrently safe.

AFAICT the only way my mental model is broken is if there is code
doing a raw get_page() rather than a skb_frag_ref() in core net stack.
I would like to get rid of these call sites if they exist. They would
not interact well with devmem I think (but could be made to work with
some effort).

--
Thanks,
Mina