Re: [PATCH v2] sd: Add timeout_sec and max_retries module parameter for sd

From: Bart Van Assche
Date: Tue Jun 03 2025 - 15:20:24 EST


On 6/3/25 1:40 PM, Masami Hiramatsu (Google) wrote:
For example, enabling CONFIG_DETECT_HUNG_TASK_BLOCKER, I got an
error message something like below (Note that this is 6.1 kernel
example, so the function names are a bit different.);

INFO: task udevd:5301 blocked for more than 122 seconds.
...
INFO: task udevd:5301 is blocked on a mutex likely owned by task kworker/u4:1:11.
task:kworker/u4:1state:D stack:0 pid:11ppid:2 flags:0x00004000
Workqueue: events_unbound async_run_entry_fn
Call Trace:
<TASK>
schedule+0x438/0x1490
? blk_mq_do_dispatch_ctx+0x70/0x1c0
schedule_timeout+0x253/0x790
? try_to_del_timer_sync+0xb0/0xb0
io_schedule_timeout+0x3f/0x80
wait_for_common_io+0xb4/0x160
blk_execute_rq+0x1bd/0x210
__scsi_execute+0x156/0x240
sd_revalidate_disk+0xa2a/0x2360
? kobject_uevent_env+0x158/0x430
sd_probe+0x364/0x47
really_probe+0x15a/0x3b0
__driver_probe_device+0x78/0xc0
driver_probe_device+0x24/0x1a0
__device_attach_driver+0x131/0x160
? coredump_store+0x50/0x50
bus_for_each_drv+0x9d/0xf0
__device_attach_async_helper+0x7e/0xd0 <=== device_lock()
...

How can this happen? The following code should prevent that a hung task
complaint appears while blk_execute_rq() is waiting:

static inline void blk_wait_io(struct completion *done)
{
/* Prevent hang_check timer from firing at us during very long I/O */
unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;

if (timeout)
while (!wait_for_completion_io_timeout(done, timeout))
;
else
wait_for_completion_io(done);
}

+/* timeout_sec defines the default value of the SCSI command timeout in second. */
+static int sd_timeout_sec = SD_TIMEOUT / HZ;
+module_param_named(timeout_sec, sd_timeout_sec, int, 0644);
+
+/*
+ * write_same_timeout_sec defines the default value of the WRITE SAME SCSI
+ * command timeout in second.
+ */
+static int sd_write_same_timeout_sec = SD_WRITE_SAME_TIMEOUT / HZ;
+module_param_named(write_same_timeout_sec, sd_write_same_timeout_sec, int, 0644);
+
+/* max_retries defines the default value of the max of SCSI command retries.*/
+static int sd_max_retries = SD_MAX_RETRIES;
+module_param_named(max_retries, sd_max_retries, int, 0644);

Shouldn't these parameters come from the SCSI host template rather than
introducing new kernel module parameters? It is impossible to make a good choice for these kernel module parameters if multiple types of SCSI
devices are present (USB, HDD, ...).

Bart.