Runqueue spinlock recursion on arm64 v4.15

From: Mark Rutland
Date: Fri Feb 02 2018 - 14:27:41 EST


Hi all,

While fuzzing arm64 v4.15 with Syzkaller, I intermittently hit spinlock
recursion splats, around one or two a day. This looks like the same
issue Sasha hit on x86 a number of months back [1].

I've been seeing this for a few releases, but haven't had any luck
tracking it down, or generating a minimized reproducer.

I've dumped my kernel config, Syzkaller logs and reports to my
kernel.org webspace [2].

The most common case looks like:

--------
BUG: spinlock recursion on CPU#1, syz-executor1/1521
lock: 0xffff80001a764080, .magic: dead4ead, .owner: syz-executor1/1521, .owner_cpu: 0
CPU: 1 PID: 1521 Comm: syz-executor1 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0xd0/0x120 lib/dump_stack.c:53
spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
spin_bug kernel/locking/spinlock_debug.c:75 [inline]
debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
__raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
_raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
__task_rq_lock+0xc0/0x288 kernel/sched/core.c:103
wake_up_new_task+0x384/0x798 kernel/sched/core.c:2465
_do_fork+0x1b4/0xa78 kernel/fork.c:2069
SYSC_clone kernel/fork.c:2154 [inline]
SyS_clone+0x48/0x60 kernel/fork.c:2132
el0_svc_naked+0x20/0x24
--------

... though I see this in other paths, e.g. one identical to Sasha's
report:

--------
BUG: spinlock recursion on CPU#0, kworker/u9:4/7741
lock: 0xffff80001a783080, .magic: dead4ead, .owner: kworker/u9:4/7741, .owner_cpu: 1
CPU: 0 PID: 7741 Comm: kworker/u9:4 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound call_usermodehelper_exec_work
Call trace:
dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0xd0/0x120 lib/dump_stack.c:53
spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
spin_bug kernel/locking/spinlock_debug.c:75 [inline]
debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
__raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
_raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
rq_lock kernel/sched/sched.h:1766 [inline]
ttwu_queue kernel/sched/core.c:1863 [inline]
try_to_wake_up+0x6c0/0xa58 kernel/sched/core.c:2078
default_wake_function+0x30/0x50 kernel/sched/core.c:3628
__wake_up_common+0x128/0x470 kernel/sched/wait.c:97
__wake_up_locked+0x18/0x20 kernel/sched/wait.c:158
complete+0x68/0x90 kernel/sched/completion.c:39
umh_complete+0x40/0xa8 kernel/umh.c:55
call_usermodehelper_exec_sync kernel/umh.c:152 [inline]
call_usermodehelper_exec_work+0x160/0x240 kernel/umh.c:175
process_one_work+0x590/0xe90 kernel/workqueue.c:2113
worker_thread+0x3b0/0xd30 kernel/workqueue.c:2247
kthread+0x2a4/0x378 kernel/kthread.c:238
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:994
--------

... in some cases, owner_cpu is -1, so I guess we're racing with an
unlock. I only ever see this on the runqueue locks in wake up functions.

I'm at a loss as to how we can see a stale owner value that matches
current. If current held the lock in the past, it would have stored -1
to owner first, and at least this should be visible.

Any ideas?

My best guess so far is that this is happening when a task_struct gets
freed and reallocated, and we're somehow missing a release-acquire
relationship. i.e. task A releases the lock on cpu X, dies, then A's
task_struct gets reallocated for task B cpu Y, without all of A's stores
on cpu X having been made visible to B on cpu Y.

>From a scan of the kmem_cache code, that seems unlikely, though. AFAICT
free and allocation have to occur on the same CPU, so that ordering
should be given by the combination of local CPU order and whatever
barriers switch_to() has...

Maybe it's possible to detect that by hacking a task generation number
into each task struct, and embedding this into the lock's owner field. I
can try hacking that up next week.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20170604070911.5sx66m5xnoh2amyd@sasha-lappy
[2] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20180202-rq-spinlock-recursion/