Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
From: Eric Dumazet
Date: Tue Oct 14 2025 - 04:25:17 EST
On Tue, Oct 14, 2025 at 1:17 AM Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> On Tue, Oct 14, 2025 at 3:01 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Mon, Oct 13, 2025 at 11:43 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
> > >
> > > > >
> > > > > A problem with the existing sysctl is that it only covers the TX path;
> > > > > for the RX path, we also observe that kswapd consumes significant power.
> > > > > I could add the patch below to make it support the RX path, but it feels
> > > > > like a bit of a layer violation, since the RX path code resides in mm
> > > > > and is intended to serve generic users rather than networking, even
> > > > > though the current callers are primarily network-related.
> > > >
> > > > You might have a buggy driver.
> > >
> > > We are observing the RX path as follows:
> > >
> > > do_softirq
> > > taskset_hi_action
> > > kalPacketAlloc
> > > __netdev_alloc_skb
> > > page_frag_alloc_align
> > > __page_frag_cache_refill
> > >
> > > This appears to be a fairly common stack.
> > >
> > > So it is a buggy driver?
> >
> > No idea, kalPacketAlloc is not in upstream trees.
> >
> > It apparently needs high order allocations. It will fail at some point.
> >
> > >
> > > >
> > > > High performance drivers use order-0 allocations only.
> > > >
> > >
> > > Do you have an example of high-performance drivers that use only order-0 memory?
> >
> > About all drivers using XDP, and/or using napi_get_frags()
> >
> > XDP has been using order-0 pages from the very beginning.
>
> Thanks! But there are still many drivers using netdev_alloc_skb()—we
> shouldn’t overlook them, right?
>
> net % git grep netdev_alloc_skb | wc -l
> 359
Only the ones that are using 16KB allocations like some WAN drivers :)
Some networks use MTU=9000
If a hardware does not provide SG support on receive, a kmalloc()
based will use 16KB of memory.
By using a frag allocator, we can pack 3 allocations per 32KB instead of 2.
TCP can go 50% faster.
If memory is short, it will fail no matter what.