Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie_free()

From: Alexei Starovoitov
Date: Fri Jun 27 2025 - 15:36:58 EST


On Fri, Jun 27, 2025 at 6:20 AM Matt Fleming <matt@xxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Jun 18, 2025 at 3:50 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > Do your homework pls.
> > Set max_entries to 100G and report back.
> > Then set max_entries to 1G _with_ cond_rescehd() hack and report back.
>
> Hi,
>
> I put together a small reproducer
> https://github.com/xdp-project/bpf-examples/pull/130 which gives the
> following results on an AMD EPYC 9684X 96-Core machine:
>
> | Num of map entries | Linux 6.12.32 | KASAN | cond_resched |
> |--------------------|---------------|---------|--------------|
> | 1K | 0ms | 4ms | 0ms |
> | 10K | 2ms | 50ms | 2ms |
> | 100K | 32ms | 511ms | 32ms |
> | 1M | 427ms | 5478ms | 420ms |
> | 10M | 5056ms | 55714ms | 5040ms |
> | 100M | 67253ms | * | 62630ms |
>
> * - I gave up waiting after 11.5 hours
>
> Enabling KASAN makes the durations an order of magnitude bigger. The
> cond_resched() patch eliminates the soft lockups with no effect on the
> times.

Good. Now you see my point, right?
The cond_resched() doesn't fix the issue.
1hr to free a trie of 100M elements is horrible.
Try 100M kmalloc/kfree to see that slab is not the issue.
trie_free() algorithm is to blame. It doesn't need to start
from the root for every element. Fix the root cause.