Re: [PATCH v2 2/2] scsi: sd: Rework asynchronous resume support

From: Bart Van Assche
Date: Fri Aug 12 2022 - 11:53:12 EST


On 8/12/22 03:48, Geert Uytterhoeven wrote:
sd_submit_start() is called once during suspend, and once during
resume. It does not hang.

Reading from /dev/sda hangs after resume (not in sd_submit_start(),
which is never called for reading).

Two tasks are blocked in blk_mq_get_tag() calling io_schedule():

task:kworker/7:1 state:D stack: 0 pid: 122 ppid: 2 flags:0x00000008
Workqueue: events ata_scsi_dev_rescan
Call trace:
__switch_to+0xbc/0x124
__schedule+0x540/0x71c
schedule+0x58/0xa0
io_schedule+0x18/0x34
blk_mq_get_tag+0x138/0x244
__blk_mq_alloc_requests+0x130/0x2f0
blk_mq_alloc_request+0x74/0xa8
scsi_alloc_request+0x10/0x30
__scsi_execute+0x5c/0x18c
scsi_vpd_inquiry+0x7c/0xdc
scsi_get_vpd_size+0x34/0xa8
scsi_get_vpd_buf+0x28/0xf4
scsi_attach_vpd+0x44/0x170
scsi_rescan_device+0x30/0x98
ata_scsi_dev_rescan+0xc8/0xfc
process_one_work+0x2e0/0x474
worker_thread+0x1cc/0x270
kthread+0xd8/0xe8
ret_from_fork+0x10/0x20


task:hd state:D stack: 0 pid: 1163 ppid: 1076 flags:0x00000000
Call trace:
__switch_to+0xbc/0x124
__schedule+0x540/0x71c
schedule+0x58/0xa0
io_schedule+0x18/0x34
blk_mq_get_tag+0x138/0x244
__blk_mq_alloc_requests+0x130/0x2f0
blk_mq_submit_bio+0x44c/0x5b4
__submit_bio+0x24/0x5c
submit_bio_noacct_nocheck+0x8c/0x178
submit_bio_noacct+0x380/0x3b0
submit_bio+0x34/0x3c
mpage_bio_submit+0x28/0x38
mpage_readahead+0xa8/0x178
blkdev_readahead+0x14/0x1c
read_pages+0x4c/0x158
page_cache_ra_unbounded+0xd8/0x174
do_page_cache_ra+0x40/0x4c
page_cache_ra_order+0x14/0x1c
ondemand_readahead+0x124/0x2fc
page_cache_sync_ra+0x50/0x54
filemap_read+0x130/0x6e8
blkdev_read_iter+0xf0/0x164
new_sync_read+0x74/0xc0
vfs_read+0xbc/0xd8
ksys_read+0x6c/0xd4
__arm64_sys_read+0x14/0x1c
invoke_syscall+0x70/0xf4
el0_svc_common.constprop.0+0xbc/0xf0
do_el0_svc+0x18/0x20
el0_svc+0x30/0x84
el0t_64_sync_handler+0x90/0xf8
el0t_64_sync+0x14c/0x150

Hi Geert,

All that can be concluded from the above is that blk_mq_get_tag() is waiting for other I/O request(s) to finish. One or more other requests are in progress and either scsi_done() has not been called for these requests or the error handler got stuck. Since the issue reported above is not observed with other ATA interfaces, this may be related to the ATA interface driver used in your test setup.

Bart.