Re: [PATCH] block: fix deadlock between blk_mq_freeze_queue and blk_mq_dispatch_list

From: Ming Lei

Date: Mon Apr 20 2026 - 03:03:36 EST


On Mon, Apr 20, 2026 at 02:31:14PM +0800, Michael Wu wrote:
> I'd like to add some important information:
>
> The three processes I mentioned—Task 1838 (Back-P10-3), Task 619
> (android.hardwar), and Task 1865 (sp-control-1)—are all in an
> uninterruptible sleep state. Therefore, once Task 1865 (sp-control-1) is
> scheduled out using `preempt_schedule_notrace`, it cannot be scheduled back.
> The reason Task 1865 (sp-control-1) is in an uninterruptible sleep state is
> because `down_write` is waiting for `io_rwsem`.
>
> My analysis of the upstream kernel code doesn't seem to have found a fix for
> this issue. This situation should theoretically exist, but I don't have a
> platform to test this low-probability behavior. However, it's certain that
> this situation occurs during I/O scheduling algorithm switching and
> concurrent F2FS write operations.
>
> In this situation, `io_schedule_prepare` is not used. The path used in Task
> 1865 is `schedule->sched_submit_work->blk_flush_plug->blk_mq_dispatch_list`.
>
> As you said, this method is indeed not good, but I don't have a better idea
> to handle this deadlock situation.

Now I got the idea, because blk_flush_plug() is called on a sleeping task,
that is why the preempted code block can't get run again even though it
doesn't sleep anywhere.

Can you try the following change?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..4217aaaa8e47 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6966,7 +6966,9 @@ static inline void sched_submit_work(struct task_struct *tsk)
* If we are going to sleep and we have plugged IO queued,
* make sure to submit it to avoid deadlocks.
*/
+ preempt_disable_notrace();
blk_flush_plug(tsk->plug, true);
+ preempt_enable_no_resched_notrace();

lock_map_release(&sched_map);
}



thanks,
Ming