Re: [RFC PATCH] net: Fix one page_pool page leak from skb_frag_unref

From: Jakub Kicinski
Date: Wed May 01 2024 - 10:24:43 EST


On Wed, 1 May 2024 00:48:43 -0700 Mina Almasry wrote:
> > 1) Revert commit 2cc3aeb5eccc
> > ("skbuff: Fix a potential race while recycling page_pool packets"). I tested
> > this btw and it works (for this specific scenario).
> >
> > 2) Revert Mina's commit a580ea994fd3 ("net: mirror skb frag ref/unref helpers")
> > for now.
>
> I vote for #1, and IIUC Jakub's feedback, he seems to prefer this as
> well.

I vote #2, actually :( Or #3 make page pool ref safe to acquire
concurrently, but that plus fixing all the places where we do crazy
things may be tricky.

Even taking the ref is not as simple as using atomic_long_inc_not_zero()
sadly, partly because we try to keep the refcount at one, in an apparent
attempt to avoid dirtying the cache line twice.

So maybe partial revert to stop be bleeding and retry after more testing
is the way to go?

I had a quick look at the code and there is also a bunch of functions
which "shift" frags from one skb to another, without checking whether
the pp_recycle state matches.