Re: Network filesystems and netmem
From: David Howells
Date: Fri Aug 08 2025 - 16:16:51 EST
Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
> > (1) The socket. We might want to group allocations relating to the same
> > socket or destined to route through the same NIC together.
> >
> > (2) The destination address. Again, we might need to group by NIC. For TCP
> > sockets, this likely doesn't matter as a connected TCP socket already
> > knows this, but for a UDP socket, you can set that in sendmsg() (and
> > indeed AF_RXRPC does just that).
> >
>
> the page_pool model groups memory by NIC (struct netdev), not socket
> or destination address. It may be feasible to extend it to be
> per-socket, but I don't immediately understand what that entails
> exactly. The page_pool uses the netdev for dma-mapping, i'm not sure
> what it would use the socket or destination address for (unless it's
> to grab the netdev :P).
Yeah - but the network filesystem doesn't necessarily know anything about what
NIC would be used... but a connected TCP socket surely does. Likewise, a UDP
socket has to perform an address lookup to find the destination/route and thus
the NIC.
So, basically all three, the socket, the address and the flag would be hints,
possibly unused for now.
> Today the page_pool doesn't really care how long you hold onto the mem
> allocated from it.
It's not so much whether the page pool cares how long we hold on to the mem,
but for a fragment allocator we want to group things together of similar
lifetime as we don't get to reuse the page until all the things in it have
been released.
And if we're doing bulk DMA/IOMMU mapping, we also potentially have a second
constraint: an IOMMU TLB entry may be keyed for a particular device.
> Honestly the subject of whether to extend the page_pool or implement a
> new allocator kinda comes up every once in a while.
Do we actually use the netmem page pools only for receiving? If that's the
case, then do I need to be managing this myself? Providing my own fragment
allocator that handles bulk DMA mapping, that is. I'd prefer to use an
existing one if I can.
David