Re: Network filesystems and netmem
From: Mina Almasry
Date: Fri Aug 08 2025 - 19:28:50 EST
On Fri, Aug 8, 2025 at 1:16 PM David Howells <dhowells@xxxxxxxxxx> wrote:
>
> Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> > > (1) The socket. We might want to group allocations relating to the same
> > > socket or destined to route through the same NIC together.
> > >
> > > (2) The destination address. Again, we might need to group by NIC. For TCP
> > > sockets, this likely doesn't matter as a connected TCP socket already
> > > knows this, but for a UDP socket, you can set that in sendmsg() (and
> > > indeed AF_RXRPC does just that).
> > >
> >
> > the page_pool model groups memory by NIC (struct netdev), not socket
> > or destination address. It may be feasible to extend it to be
> > per-socket, but I don't immediately understand what that entails
> > exactly. The page_pool uses the netdev for dma-mapping, i'm not sure
> > what it would use the socket or destination address for (unless it's
> > to grab the netdev :P).
>
> Yeah - but the network filesystem doesn't necessarily know anything about what
> NIC would be used... but a connected TCP socket surely does. Likewise, a UDP
> socket has to perform an address lookup to find the destination/route and thus
> the NIC.
>
> So, basically all three, the socket, the address and the flag would be hints,
> possibly unused for now.
>
> > Today the page_pool doesn't really care how long you hold onto the mem
> > allocated from it.
>
> It's not so much whether the page pool cares how long we hold on to the mem,
> but for a fragment allocator we want to group things together of similar
> lifetime as we don't get to reuse the page until all the things in it have
> been released.
>
> And if we're doing bulk DMA/IOMMU mapping, we also potentially have a second
> constraint: an IOMMU TLB entry may be keyed for a particular device.
>
> > Honestly the subject of whether to extend the page_pool or implement a
> > new allocator kinda comes up every once in a while.
>
> Do we actually use the netmem page pools only for receiving? If that's the
> case, then do I need to be managing this myself? Providing my own fragment
> allocator that handles bulk DMA mapping, that is. I'd prefer to use an
> existing one if I can.
>
Yes we only use page_pools for receiving at the moment. Some
discussion around using the page_pool for normal TX networking
happened in the past, but I can't find the thread.
I'm unsure what it would take to make it some-tx-path compatible off
the top of my head. At the very least, the page_pool at the moment has
some dependency/logic on napi-id that it may get from the driver, that
may need to be factored out. See all the places we touch pool->p.napi
in page_pool.c and other files. Or, like you said, you may want your
own fragment allocator, if wrestling the page_pool to do what you want
is too cumbersome.
--
Thanks,
Mina