futex performance regression from "futex: Allow automatic allocation of process wide futex hash"
From: Chris Mason
Date: Tue Jun 03 2025 - 15:03:05 EST
Hi everyone,
While testing Peter's latest scheduler patches against current Linus
git, I found a pretty big performance regression with schbench:
https://github.com/masoncl/schbench
The command line I was using:
schbench -L -m 4 -M auto -t 256 -n 0 -r 60 -s 0
Bisecting the problem I landed on commit:
commit 7c4f75a21f636486d2969d9b6680403ea8483539 (HEAD -> update)
Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Date: Wed Apr 16 18:29:13 2025 +0200
futex: Allow automatic allocation of process wide futex hash
Allocate a private futex hash with 16 slots if a task forks its first
thread.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link:
https://lore.kernel.org/r/20250416162921.513656-14bigeasy@xxxxxxxxxxxxx
schbench uses one futex per thread, and the command line ends up
allocating 1024 threads, so the default bucket size used by this commit
is just too small. Using 2048 buckets makes the problem go away.
On my big turin system, this commit slows down RPS by 36%. But even a
VM on a skylake machine sees a 29% difference.
schbench is a microbenchmark, so grain of salt on all of this, but I
think our defaults are probably too low.
-chris