Re: futex performance regression from "futex: Allow automatic allocation of process wide futex hash"

From: Chris Mason
Date: Fri Jun 06 2025 - 18:18:16 EST


On 6/6/25 3:06 AM, Sebastian Andrzej Siewior wrote:
> On 2025-06-05 20:55:27 [-0400], Chris Mason wrote:

[ ... ]

>> Going back to your diff, if we have a process growing the total number
>> of threads, can we set FH_IMMUTABLE too early? As the number of threads
>> increases, eventually we'll pick the 2x num_cpus, but that'll take a while?
>
> If you refer to the schbench diff, then set it early. Once the prctl()
> to set the size of the private hash, there will be no resize by the
> kernel.
>
> If you refer to the kernel diff where set the FH_IMMUTABLE flag, then it
> is set once the upper limit is reached (that was the plan in case I did
> the logic wrong). Which means at that point it won't increase any
> further because of the CPU limit. The only way how you can reach it too
> early is if you offline CPUs.

I pulled your immutable diff on top of c0c9379f235d. Just the immutable
diff, not the first suggestion that bumped num_online_cpus() * 2.

The RPS did not improve (~2.5M RPS). Looking at the futex_private_hash
of the process:

>>> task.mm.futex_phash
*(struct futex_private_hash *)0xff110003bbd15000 = {
.users = (rcuref_t){
.refcnt = (atomic_t){
.counter = (int)0,
},
},
.hash_mask = (unsigned int)15,
.rcu = (struct callback_head){
.next = (struct callback_head *)0x0,
.func = (void (*)(struct callback_head *))0x0,
},
.mm = (void *)0xff110003321a2400,
.custom = (bool)0,
.immutable = (bool)1,
.queues = (struct futex_hash_bucket []){},
}

It's because our calculation on the max is based on
min(get_nr_threads(current), num_online_cpus())

get_nr_threads starts smaller than num_online_cpus(), so we immediately
decide we've maxed out the buckets?

-chris