Re: [PATCH] net: page_pool: fix refcounting issues with fragmented allocation

From: Alexander Duyck
Date: Thu Jan 26 2023 - 10:41:35 EST


On Thu, Jan 26, 2023 at 2:32 AM Ilias Apalodimas
<ilias.apalodimas@xxxxxxxxxx> wrote:
>
> Hi Alexander,
>
> Sorry for being late to the party, was overloaded...
>
> On Tue, Jan 24, 2023 at 07:57:35AM -0800, Alexander H Duyck wrote:
> > On Tue, 2023-01-24 at 16:11 +0200, Ilias Apalodimas wrote:
> > > Hi Felix,
> > >
> > > ++cc Alexander and Yunsheng.
> > >
> > > Thanks for the report
> > >
> > > On Tue, 24 Jan 2023 at 14:43, Felix Fietkau <nbd@xxxxxxxx> wrote:
> > > >
> > > > While testing fragmented page_pool allocation in the mt76 driver, I was able
> > > > to reliably trigger page refcount underflow issues, which did not occur with
> > > > full-page page_pool allocation.
> > > > It appears to me, that handling refcounting in two separate counters
> > > > (page->pp_frag_count and page refcount) is racy when page refcount gets
> > > > incremented by code dealing with skb fragments directly, and
> > > > page_pool_return_skb_page is called multiple times for the same fragment.
> > > >
> > > > Dropping page->pp_frag_count and relying entirely on the page refcount makes
> > > > these underflow issues and crashes go away.
> > > >
> > >
> > > This has been discussed here [1]. TL;DR changing this to page
> > > refcount might blow up in other colorful ways. Can we look closer and
> > > figure out why the underflow happens?
> > >
> > > [1] https://lore.kernel.org/netdev/1625903002-31619-4-git-send-email-linyunsheng@xxxxxxxxxx/
> > >
> > > Thanks
> > > /Ilias
> > >
> > >
> >
> > The logic should be safe in terms of the page pool itself as it should
> > be holding one reference to the page while the pp_frag_count is non-
> > zero. That one reference is what keeps the two halfs in sync as the
> > page shouldn't be able to be freed until we exhaust the pp_frag_count.
>
> Do you remember why we decided to go with the fragment counter instead of
> page references?

The issue has to do with when to destroy the mappings. Basically with
the fragment counter we destroy the mappings and remove the page from
the pool when the count hits 0. The reference count is really used for
the page allocator to do its tracking. If we end up trying to merge
the two the problem becomes one of lifetimes as we wouldn't know when
to destroy the DMA mappings as they would have to live the full life
of the page.

> >
> > To have an underflow there are two possible scenarios. One is that
> > either put_page or free_page is being called somewhere that the
> > page_pool freeing functions should be used.
>
> Wouldn't that affect the non fragmented path as well? IOW the driver that
> works with a full page would crash as well.

The problem is the non-fragmented path doesn't get as noisy. Also
there aren't currently any wireless drivers making use of the page
pool, or at least that is my understanding. I'm suspecting something
like the issue we saw in 1effe8ca4e34c ("skbuff: fix coalescing for
page_pool fragment recycling"). We likely have some corner case where
we should be taking a page reference and clearing a pp_recycle flag.