Re: usb/gadget: potential deadlock in gadgetfs_suspend

From: Andrey Konovalov
Date: Mon Jun 12 2017 - 13:26:01 EST


On Mon, Jun 12, 2017 at 6:11 PM, Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
> On Fri, Jun 9, 2017 at 9:14 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>> On Fri, 9 Jun 2017, Andrey Konovalov wrote:
>>
>>> On Fri, Jun 9, 2017 at 2:41 PM, Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
>>> > Hi,
>>> >
>>> > I'm getting some hangs while fuzzing the kernel with syzkaller.
>>> >
>>> > Possibly it happens during the execution of the following syzkaller program:
>>> >
>>> > mmap(&(0x7f0000000000/0xb90000)=nil, (0xb90000), 0x3, 0x32,
>>> > 0xffffffffffffffff, 0x0)
>>> > r0 = open$usb(&(0x7f0000001000)="2f6465762f6761646765742f64756d6d795f75646300",
>>> > 0xc002, 0x0)
>>> > r1 = open$usb(&(0x7f0000002000)="2f6465762f6761646765742f64756d6d795f75646300",
>>> > 0x1, 0x102)
>>> > write$usb(r1, &(0x7f0000003000)={0x0, {0x9, 0x2, 0x1b, 0x0, 0x5, 0x0,
>>> > 0x80, 0x8, 0x9, 0x4, 0x10000000, 0xfffffefffffffff9, 0x1, 0xff, 0x0,
>>
>> I don't understand these large constants. They're supposed to be __u8
>> values. Do they get truncated to the least significant byte?
>
> This program doesn't lead to crashes, it was a bug in syzkaller. The
> format keeps changing, I'll explain it if I send another program.
>
>>
>>> > 0x8, 0x80, [{0x9, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}]}, {0x12, 0x1,
>>> > 0x0, 0x0, 0x4b5, 0x7c, 0x0, 0x3, 0x4, 0x0, 0x8, 0xd686, 0x0, 0x1}},
>>> > 0x31)
>>> >
>>> > I haven't managed to get the exact same stack trace (or any at all
>>> > actually) while trying to reproduce the bug with this program, but the
>>> > kernel definitely hangs.
>
> It seems that I get the stall reports quite rarely and I can't
> reproduce them. However I get the "bad spinlock magic" crashes quite
> often and able to reproduce them. So I can apply debug patches.
>
> Reproducing sometimes requires quite some time (~10 minutes), so this
> seems to be some kind of race. I noticed, that often I unmount
> gadgetfs before the USB device has enough time to properly initialize,
> so this might be a race of unmount vs device initialization or
> something similar.

Aha, got KASAN report:

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x3069/0x3690
kernel/locking/lockdep.c:3246
Read of size 8 at addr ffff88003a2bdaf8 by task kworker/3:1/903

CPU: 3 PID: 903 Comm: kworker/3:1 Not tainted 4.12.0-rc4+ #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x292/0x395 lib/dump_stack.c:52
print_address_description+0x78/0x280 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x230/0x340 mm/kasan/report.c:408
__asan_report_load8_noabort+0x19/0x20 mm/kasan/report.c:429
__lock_acquire+0x3069/0x3690 kernel/locking/lockdep.c:3246
lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
gadgetfs_suspend+0x89/0x130 drivers/usb/gadget/legacy/inode.c:1682
set_link_state+0x88e/0xae0 drivers/usb/gadget/udc/dummy_hcd.c:455
dummy_hub_control+0xd7e/0x1fb0 drivers/usb/gadget/udc/dummy_hcd.c:2074
rh_call_control drivers/usb/core/hcd.c:689 [inline]
rh_urb_enqueue drivers/usb/core/hcd.c:846 [inline]
usb_hcd_submit_urb+0x92f/0x20b0 drivers/usb/core/hcd.c:1650
usb_submit_urb+0x8b2/0x12c0 drivers/usb/core/urb.c:542
usb_start_wait_urb+0x148/0x5b0 drivers/usb/core/message.c:56
usb_internal_control_msg drivers/usb/core/message.c:100 [inline]
usb_control_msg+0x341/0x4d0 drivers/usb/core/message.c:151
usb_clear_port_feature+0x74/0xa0 drivers/usb/core/hub.c:412
hub_port_disable+0x123/0x510 drivers/usb/core/hub.c:4177
hub_port_init+0x1ed/0x2940 drivers/usb/core/hub.c:4648
hub_port_connect drivers/usb/core/hub.c:4826 [inline]
hub_port_connect_change drivers/usb/core/hub.c:4999 [inline]
port_event drivers/usb/core/hub.c:5105 [inline]
hub_event+0x1ae1/0x3d40 drivers/usb/core/hub.c:5185
process_one_work+0xc08/0x1bd0 kernel/workqueue.c:2097
process_scheduled_works kernel/workqueue.c:2157 [inline]
worker_thread+0xb2b/0x1860 kernel/workqueue.c:2233
kthread+0x363/0x440 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:424

Allocated by task 9958:
save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:617
kmem_cache_alloc_trace+0x87/0x280 mm/slub.c:2745
kmalloc include/linux/slab.h:492 [inline]
kzalloc include/linux/slab.h:665 [inline]
dev_new drivers/usb/gadget/legacy/inode.c:170 [inline]
gadgetfs_fill_super+0x24f/0x540 drivers/usb/gadget/legacy/inode.c:1993
mount_single+0xf6/0x160 fs/super.c:1192
gadgetfs_mount+0x31/0x40 drivers/usb/gadget/legacy/inode.c:2019
mount_fs+0x9c/0x2d0 fs/super.c:1223
vfs_kern_mount.part.25+0xcb/0x490 fs/namespace.c:976
vfs_kern_mount fs/namespace.c:2509 [inline]
do_new_mount fs/namespace.c:2512 [inline]
do_mount+0x41b/0x2d90 fs/namespace.c:2834
SYSC_mount fs/namespace.c:3050 [inline]
SyS_mount+0xb0/0x120 fs/namespace.c:3027
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 9960:
save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:590
slab_free_hook mm/slub.c:1357 [inline]
slab_free_freelist_hook mm/slub.c:1379 [inline]
slab_free mm/slub.c:2961 [inline]
kfree+0xed/0x2b0 mm/slub.c:3882
put_dev+0x124/0x160 drivers/usb/gadget/legacy/inode.c:163
gadgetfs_kill_sb+0x33/0x60 drivers/usb/gadget/legacy/inode.c:2027
deactivate_locked_super+0x8d/0xd0 fs/super.c:309
deactivate_super+0x21e/0x310 fs/super.c:340
cleanup_mnt+0xb7/0x150 fs/namespace.c:1112
__cleanup_mnt+0x1b/0x20 fs/namespace.c:1119
task_work_run+0x1a0/0x280 kernel/task_work.c:116
exit_task_work include/linux/task_work.h:21 [inline]
do_exit+0x18a8/0x2820 kernel/exit.c:878
do_group_exit+0x14e/0x420 kernel/exit.c:982
get_signal+0x784/0x1780 kernel/signal.c:2318
do_signal+0xd7/0x2130 arch/x86/kernel/signal.c:808
exit_to_usermode_loop+0x1ac/0x240 arch/x86/entry/common.c:157
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath+0x3ba/0x410 arch/x86/entry/common.c:263
entry_SYSCALL_64_fastpath+0xbc/0xbe

The buggy address belongs to the object at ffff88003a2bdae0
which belongs to the cache kmalloc-1024 of size 1024
The buggy address is located 24 bytes inside of
1024-byte region [ffff88003a2bdae0, ffff88003a2bdee0)
The buggy address belongs to the page:
page:ffffea0000e8ae00 count:1 mapcount:0 mapping: (null)
index:0x0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw: 0100000000008100 0000000000000000 0000000000000000 0000000100170017
raw: ffffea0000ed3020 ffffea0000f5f820 ffff88003e80efc0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88003a2bd980: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88003a2bda00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88003a2bda80: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb
^
ffff88003a2bdb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88003a2bdb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


>
>>
>> Can you get a usbmon trace for the test?
>
> I wasn't able to reproduce the crash with usbmon tracing enabled.
> Maybe usbmon tracing affects timings somehow and the race is quite
> less likely to be triggered. I'll keep trying though.
>
>> And can you enable debugging for the usbcore module?
>>
>> echo "module usbcore =p" >/sys/kernel/debug/dynamic_debug/control
>>
>
> I was able to reproduce the crash with debugging for usbcore enabled:
>
> https://gist.github.com/xairy/52bfb302d824eb55d39776eb156812cc
>
>>> > On commit b29794ec95c6856b316c2295904208bf11ffddd9 (4.12-rc4+) with
>>> > Alan's patch applied.
>>> >
>>> > gadgetfs: bound to dummy_udc driver
>>> > gadgetfs: suspended from state 2
>>> > INFO: rcu_sched detected stalls on CPUs/tasks:
>>> > 1-...: (0 ticks this GP) idle=966/140000000000000/0 softirq=37706/37706 fqs=5250
>>> > (detected by 2, t=21002 jiffies, g=26575, c=26574, q=183)
>>> > Sending NMI from CPU 2 to CPUs 1:
>>> > NMI backtrace for cpu 1
>>> > CPU: 1 PID: 1394 Comm: kworker/1:2 Not tainted 4.12.0-rc4+ #24
>>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> > Workqueue: usb_hub_wq hub_event
>>> > task: ffff88003ebfb640 task.stack: ffffc900024fc000
>>> > RIP: 0010:rep_nop arch/x86/include/asm/processor.h:619 [inline]
>>> > RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:624 [inline]
>>> > RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:63 [inline]
>>> > RIP: 0010:queued_spin_lock_slowpath+0x20/0x1a0 kernel/locking/qspinlock.c:421
>>> > RSP: 0018:ffffc900024ff9f0 EFLAGS: 00000002
>>> > RAX: 000000007d72e420 RBX: ffff88007d72e768 RCX: 0000000000010000
>>> > RDX: 0000000000000001 RSI: 000000007d72e420 RDI: ffff88007d72e768
>>> > RBP: ffffc900024ff9f0 R08: 0000000000000006 R09: 0000000000000020
>>> > R10: ffffc900024ffa50 R11: 0000000000d52301 R12: ffff88007da9c298
>>> > R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff82cfc5a0
>>> > FS: 0000000000000000(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
>>> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> > CR2: 00007f728a487000 CR3: 00000000371e6000 CR4: 00000000000006e0
>>> > Call Trace:
>>> > __raw_spin_lock include/asm-generic/qspinlock.h:103 [inline]
>>> > _raw_spin_lock+0x1c/0x20 kernel/locking/spinlock.c:151
>>> > spin_lock include/linux/spinlock.h:299 [inline]
>>> > gadgetfs_suspend+0x32/0x90 drivers/usb/gadget/legacy/inode.c:1684
>>
>> Looks like it's waiting for the spinlock in gadgetfs_suspend. Nothing
>> else should be holding that lock.
>>
>> Were any other tasks stalled?
>
> It seems that I get these stalls quite rarely and I can't reproduce them.
>
>>
>>> > set_link_state+0x39c/0x440 drivers/usb/gadget/udc/dummy_hcd.c:455
>>> > dummy_hub_control+0x3e7/0x650 drivers/usb/gadget/udc/dummy_hcd.c:2074
>>> > rh_call_control drivers/usb/core/hcd.c:689 [inline]
>>> > rh_urb_enqueue drivers/usb/core/hcd.c:846 [inline]
>>> > usb_hcd_submit_urb+0x327/0xcf0 drivers/usb/core/hcd.c:1650
>>> > usb_submit_urb+0x355/0x6f0 drivers/usb/core/urb.c:542
>>> > usb_start_wait_urb+0x5f/0x110 drivers/usb/core/message.c:56
>>> > usb_internal_control_msg drivers/usb/core/message.c:100 [inline]
>>> > usb_control_msg+0xd9/0x120 drivers/usb/core/message.c:151
>>> > usb_clear_port_feature+0x46/0x60 drivers/usb/core/hub.c:412
>>> > hub_port_disable+0x65/0x1d0 drivers/usb/core/hub.c:4177
>>> > hub_port_init+0x10a/0xee0 drivers/usb/core/hub.c:4648
>>> > hub_port_connect drivers/usb/core/hub.c:4826 [inline]
>>> > hub_port_connect_change drivers/usb/core/hub.c:4999 [inline]
>>> > port_event drivers/usb/core/hub.c:5105 [inline]
>>> > hub_event+0xa0b/0x16e0 drivers/usb/core/hub.c:5189
>>> > process_one_work+0x1fb/0x4c0 kernel/workqueue.c:2097
>>> > process_scheduled_works kernel/workqueue.c:2157 [inline]
>>> > worker_thread+0x2ab/0x4c0 kernel/workqueue.c:2233
>>> > kthread+0x140/0x160 kernel/kthread.c:231
>>> > ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:424
>>> > Code: 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 ba
>>> > 01 00 00 00 8b 07 85 c0 75 0a f0 0f b1 17 85 c0 75 f2 5d c3 f3 90 <eb>
>>> > ec 81 fe 00 01 00 00 0f 84 92 00 00 00 41 b8 01 01 00 00 b9
>>
>> Alan Stern
>>