Re: 6.0-rc1 regression block (blk_mq) / RCU task stuck errors + block-io hang

From: Bart Van Assche
Date: Fri Aug 19 2022 - 10:49:34 EST


On 8/19/22 05:01, Hans de Goede wrote:
I've been dogfooding 6.0-rc1 on my main workstation and I have hit
this pretty serious bug, serious enough for me to go back to 5.19

My dmesg is showing various blk_mq (RCU?) related lockdep splats
followed by some tasks getting stuck on disk-IO. E.g. "sync"
is guaranteed to hang, but other tasks too.

This seems to be mainly the case on "sd" disks (both sata
and USB) where as my main nvme drive seems fine, which has
probably saved me from worse issues...

Here are 4 task stuck reports from my last boot, where
I had to turn off the machine by keeping the power button
pressed for 4 seconds.

[ ... ]
>
Sorry for not being able to write a better bug-report but I don't have
the time to dive into this deeper. I hope this report is enough for
someone to have a clue what is going on.

Thank you for the detailed report. I think this report is detailed enough to root-cause this issue, something that was not possible before this report.

Please help with verifying whether this patch fixes this issue: "[PATCH] scsi: sd: Revert "Rework asynchronous resume support"" (https://lore.kernel.org/linux-scsi/20220816172638.538734-1-bvanassche@xxxxxxx/).

Thanks,

Bart.