Re: [tip: locking/urgent] futex: Allow to resize the private local hash
From: Sebastian Andrzej Siewior
Date: Wed Jun 18 2025 - 12:03:46 EST
On 2025-06-17 09:11:06 [-0700], Calvin Owens wrote:
> Actually got an oops this time:
>
> Oops: general protection fault, probably for non-canonical address 0xfdd92c90843cf111: 0000 [#1] SMP
> CPU: 3 UID: 1000 PID: 323127 Comm: cargo Not tainted 6.16.0-rc2-lto-00024-g9afe652958c3 #1 PREEMPT
> Hardware name: ASRock B850 Pro-A/B850 Pro-A, BIOS 3.11 11/12/2024
> RIP: 0010:queued_spin_lock_slowpath+0x12a/0x1d0
…
> Call Trace:
> <TASK>
> futex_unqueue+0x2e/0x110
> __futex_wait+0xc5/0x130
> futex_wait+0xee/0x180
> do_futex+0x86/0x120
> __se_sys_futex+0x16d/0x1e0
> do_syscall_64+0x47/0x170
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f086e918779
The lock_ptr is pointing to invalid memory. It explodes within
queued_spin_lock_slowpath() which looks like decode_tail() returned a
wrong pointer/ offset.
futex_queue() adds a local futex_q to the list and its lock_ptr points
to the hb lock. Then we do schedule() and after the wakeup the lock_ptr
is NULL after a successful wake. Otherwise it still points to the
futex_hash_bucket::lock.
Since futex_unqueue() attempts to acquire the lock, then there was no
wakeup but a timeout or a signal that ended the wait. The lock_ptr can
change during resize.
During the resize futex_rehash_private() moves the futex_q members from
the old queue to the new one. The lock is accessed within RCU and the
lock_ptr value is compared against the old value after locking. That
means it is accessed either before the rehash moved it the new hash
bucket or afterwards.
I don't see how this pointer can become invalid. RCU protects against
cleanup and the pointer compare ensures that it is the "current"
pointer.
I've been looking at clang's assembly of futex_unqueue() and it looks
correct. And futex_rehash_private() iterates over all slots.
> This is a giant Yocto build, but the comm is always cargo, so hopefully
> I can run those bits in isolation and hit it more quickly.
If it still explodes without LTO, would you mind trying gcc?
> Thanks,
> Calvin
Sebastian