Re: [net PATCH] skb: Do mix page pool and page referenced frags in GRO

From: Alexander Duyck
Date: Thu Jan 26 2023 - 14:48:26 EST


On Thu, Jan 26, 2023 at 11:14 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Alexander Duyck <alexander.duyck@xxxxxxxxx> writes:
>
> > From: Alexander Duyck <alexanderduyck@xxxxxx>
> >
> > GSO should not merge page pool recycled frames with standard reference
> > counted frames. Traditionally this didn't occur, at least not often.
> > However as we start looking at adding support for wireless adapters there
> > becomes the potential to mix the two due to A-MSDU repartitioning frames in
> > the receive path. There are possibly other places where this may have
> > occurred however I suspect they must be few and far between as we have not
> > seen this issue until now.
> >
> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> > Reported-by: Felix Fietkau <nbd@xxxxxxxx>
> > Signed-off-by: Alexander Duyck <alexanderduyck@xxxxxx>
>
> I know I'm pattern matching a bit crudely here, but we recently had
> another report where doing a get_page() on skb->head didn't seem to be
> enough; any chance they might be related?
>
> See: https://lore.kernel.org/r/Y9BfknDG0LXmruDu@JNXK7M3

Looking at it I wouldn't think so. Doing get_page() on these frames is
fine. In the case you reference it looks like get_page() is being
called on a slab allocated skb head. So somehow a slab allocated head
is leaking through.

What is causing the issue here is that after get_page() is being
called and the fragments are moved into a non-pp_recycle skb they are
then picked out and merged back into a pp_recycle skb. As a result
what is happening is that we are seeing a reference count leak from
pp_frag_count and into refcount.

This is the quick-n-dirty fix. I am debating if we want to expand this
so that we could support the case where the donor frame is pp_recycle
but the recipient is a standard reference counted frame. Fixing that
would essentially consist of having to add logic to take the reference
on all donor frags, making certain that nr_frags on the donor skb
isn't altered, and then lastly making sure that all cases use the
NAPI_GRO_FREE path to drop the page pool counts.