Re: [RFC PATCH] net: Fix one page_pool page leak from skb_frag_unref

From: Jakub Kicinski
Date: Wed May 01 2024 - 10:28:51 EST


On Wed, 1 May 2024 07:24:34 -0700 Jakub Kicinski wrote:
> I vote #2, actually :( Or #3 make page pool ref safe to acquire
> concurrently, but that plus fixing all the places where we do crazy
> things may be tricky.
>
> Even taking the ref is not as simple as using atomic_long_inc_not_zero()
> sadly, partly because we try to keep the refcount at one, in an apparent
> attempt to avoid dirtying the cache line twice.
>
> So maybe partial revert to stop be bleeding and retry after more testing
> is the way to go?
>
> I had a quick look at the code and there is also a bunch of functions
> which "shift" frags from one skb to another, without checking whether
> the pp_recycle state matches.

BTW these two refs seem to look at the wrong skb:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0c8b82750000..afd3336928d0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2148,7 +2148,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom,
}
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
- skb_frag_ref(skb, i);
+ skb_frag_ref(n, i);
}
skb_shinfo(n)->nr_frags = i;
}
@@ -5934,7 +5934,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
* since we set nr_frags to 0.
*/
for (i = 0; i < from_shinfo->nr_frags; i++)
- __skb_frag_ref(&from_shinfo->frags[i], from->pp_recycle);
+ __skb_frag_ref(&from_shinfo->frags[i], to->pp_recycle);

to->truesize += delta;
to->len += len;