Re: [LKP] a9f38e1dec [ 245.678853] INFO: task mount:580 blocked for more than 120 seconds.

From: jianchao.wang
Date: Thu Nov 08 2018 - 20:38:33 EST




On 11/7/18 11:17 PM, Jens Axboe wrote:
> On 11/7/18 1:22 AM, jianchao.wang wrote:
>>
>>
>> On 11/7/18 11:16 AM, kernel test robot wrote:
>>> Greetings,
>>>
>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=8qsZc_c8QfZOGoITiebvlt5BO9GBK61cAY4xLOCegFU&s=ZGRhkdRCPvSKvbZzx-HAQtWcDZfzce60e2TMvBQyORo&e= master
>>>
>>> commit a9f38e1dec107af70d81338332494bf0a1e76597
>>> Author: Omar Sandoval <osandov@xxxxxx>
>>> AuthorDate: Mon Oct 15 09:21:34 2018 -0600
>>> Commit: Jens Axboe <axboe@xxxxxxxxx>
>>> CommitDate: Tue Oct 16 09:50:14 2018 -0600
>>>
>>> floppy: convert to blk-mq
>>>
>>> This driver likes to fetch requests from all over the place, so make
>>> queue_rq put requests on a list so that the logic stays the same. Tested
>>> with QEMU.
>>>
>>> Signed-off-by: Omar Sandoval <osandov@xxxxxx>
>>>
>>> Converted to blk_mq_init_sq_queue() and fixed a few spots where the
>>> tag_set leaked on cleanup.
>>>
>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>
>>> 6ec3938cff ataflop: convert to blk-mq
>>> a9f38e1dec floppy: convert to blk-mq
>>> 8053e5b93e Merge tag 'trace-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
>>> 337734cbca Add linux-next specific files for 20181106
>>> +---------------------------------------------------------+------------+------------+------------+---------------+
>>> | | 6ec3938cff | a9f38e1dec | 8053e5b93e | next-20181106 |
>>> +---------------------------------------------------------+------------+------------+------------+---------------+
>>> | boot_successes | 251 | 68 | 69 | 74 |
>>> | boot_failures | 48 | 27 | 25 | 23 |
>>> | Mem-Info | 48 | 12 | 15 | 7 |
>>> | invoked_oom-killer:gfp_mask=0x | 6 | 4 | 3 | 2 |
>>> | EIP:iov_iter_fault_in_readable | 4 | 1 | | |
>>> | Out_of_memory:Kill_process | 3 | 2 | 2 | |
>>> | Out_of_memory_and_no_killable_processes | 3 | 2 | 2 | |
>>> | Kernel_panic-not_syncing:System_is_deadlocked_on_memory | 3 | 2 | 2 | |
>>> | EIP:__put_user_4 | 1 | | | |
>>> | INFO:task_blocked_for_more_than#seconds | 0 | 15 | 10 | 16 |
>>> | EIP:lock_is_held_type | 0 | 1 | 0 | 1 |
>>> | EIP:native_safe_halt | 0 | 3 | | |
>>> | Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | 15 | 10 | 16 |
>>> | EIP:strnlen_user | 0 | 1 | | |
>>> | EIP:_local_bh_enable | 0 | 1 | | |
>>> | EIP:__lock_acquire | 0 | 1 | 1 | 1 |
>>> | EIP:vprintk_emit | 0 | 1 | | |
>>> | EIP:do_softirq_own_stack | 0 | 0 | 1 | |
>>> | EIP:default_idle | 0 | 0 | 2 | 1 |
>>> | EIP:reschedule_interrupt | 0 | 0 | 1 | |
>>> | EIP:delay_tsc | 0 | 0 | 0 | 1 |
>>> | EIP:copy_page_from_iter | 0 | 0 | 0 | 1 |
>>> | EIP:_cond_resched | 0 | 0 | 0 | 1 |
>>> +---------------------------------------------------------+------------+------------+------------+---------------+
>>>
>>> [ 244.196551] random: get_random_u32 called from arch_align_stack+0x2d/0x40 with crng_init=1
>>> [ 245.254485] _warn_unseeded_randomness: 54 callbacks suppressed
>>> [ 245.254491] random: get_random_u32 called from copy_process+0x1c8/0x17b0 with crng_init=1
>>> [ 245.257366] random: get_random_u32 called from arch_pick_mmap_layout+0x110/0x140 with crng_init=1
>>> [ 245.258675] random: get_random_u32 called from load_elf_binary+0x2b6/0xb48 with crng_init=1
>>> [ 245.678853] INFO: task mount:580 blocked for more than 120 seconds.
>>> [ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1
>>> [ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [ 245.682181] mount D 6372 580 1 0x00000004
>>> [ 245.683023] Call Trace:
>>> [ 245.683425] __schedule+0x2df/0x570
>>> [ 245.683975] schedule+0x2d/0x80
>>> [ 245.684476] schedule_timeout+0x19d/0x330
>>> [ 245.685090] ? wait_for_common+0xa5/0x170
>>> [ 245.685735] wait_for_common+0xac/0x170
>>> [ 245.686339] ? do_sched_yield+0x90/0x90
>>> [ 245.686935] wait_for_completion+0x12/0x20
>>> [ 245.687571] __floppy_read_block_0+0xfb/0x150
>>> [ 245.688244] ? floppy_resume+0x40/0x40
>>> [ 245.688844] floppy_revalidate+0x20f/0x240
>>> [ 245.689486] check_disk_change+0x43/0x60
>>> [ 245.690087] floppy_open+0x1ea/0x360
>>> [ 245.690653] __blkdev_get+0xb4/0x4d0
>>> [ 245.691212] ? blkdev_get+0x1db/0x370
>>> [ 245.691777] blkdev_get+0x1f3/0x370
>>> [ 245.692351] ? path_put+0x15/0x20
>>> [ 245.692871] ? lookup_bdev+0x4b/0x90
>>> [ 245.693539] blkdev_get_by_path+0x3d/0x80
>>> [ 245.694165] mount_bdev+0x2a/0x190
>>> [ 245.694695] squashfs_mount+0x10/0x20
>>> [ 245.695271] ? squashfs_alloc_inode+0x30/0x30
>>> [ 245.695960] mount_fs+0xf/0x90
>>> [ 245.696451] vfs_kern_mount+0x43/0x130
>>> [ 245.697036] do_mount+0x187/0xc40
>>> [ 245.697563] ? memdup_user+0x28/0x50
>>> [ 245.698124] ksys_mount+0x60/0xc0
>>> [ 245.698639] sys_mount+0x19/0x20
>>> [ 245.699167] do_int80_syscall_32+0x61/0x130
>>> [ 245.699813] entry_INT80_32+0xc7/0xc7
>>> [ 245.700391] EIP: 0x47f46fee
>>> [ 245.700829] Code: 11 83 c8 ff eb e2 90 90 90 90 90 90 57 56 53 8b 7c 24 20 8b 74 24 1c 8b 54 24 18 8b 4c 24 14 8b 5c 24 10 b8 15 00 00 00 cd 80 <5b> 5e 5f 3d 01 f0 ff ff 73 01 c3 e8 e8 80 03 00 81 c1 f6 5f 09 00
>>> [ 245.703680] EAX: ffffffda EBX: affa4efa ECX: affa4f03 EDX: 080d3630
>>> [ 245.704642] ESI: 00008010 EDI: 00000000 EBP: 00000000 ESP: affa45c0
>>> [ 245.705637] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000282
>>
>> There is not a q->queue_lock as blk-legacy to serialize floppy_queue_rq from different contexts.
>
> Doesn't look immediately racy, the internal queue list is protected by
> the floppy_lock, and the busy start is marked by atomic bit setting.

Oops, I missed the fdc_busy bit set.
I just looked at the patch and found the biggest change was this.
Sorry for the noise.

> Do you see an issue with it, or are you just pontificating?
>
> FWIW, just ran qemu with two floppy drives and concurrent access,
> didn't see an issue.
>

Thanks
Jianchao