Re: [REGRESSION] CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck reboots

From: Mark Rutland
Date: Mon Mar 20 2023 - 12:16:06 EST


On Mon, Mar 20, 2023 at 01:52:47PM +0000, Mark Rutland wrote:
> On Tue, Mar 14, 2023 at 11:00:04PM +0000, Alexey Klimov wrote:
> > #regzbot introduced: 0c5ffc3d7b15
> > #regzbot title: CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck reboots
> >
> > The upstream changes are being merged into android-mainline repo and at some
> > point we started to observe kernel panics on reboot or long reboot times.
>
> Where can I find the android-mainline repo, and which specific branch/commit
> from that repo is being merged in?

I assume that was the android-mainline branch in:

https://android.googlesource.com/kernel/common/

... and I had a go with commit:

8338670fd5bdf8d7 ("Merge "Merge 36289a03bcd3 ("Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6") into android-mainline" into android-mainline")

... as that was the commit immediately before your local revert:

a32cec8e3f2253bc ("ANDROID: Revert "cpuidle, dt: Push RCU-idle into driver")

Testing on Juno R2 with defconfig + PROVE_LOCKING=y + DEBUG_LOCKDEP=y, I cannot
reproduce the reboot issue; everything seems to work just fine.

Can you say which config you're using?

Just to check: are you using a pristine version of that tree, or do you have
any vendor hooks present? I note that there are special hooks added to the
cpuidle and PSCI code, and I can imagine those might expect the old behaviour
w.r.t. RCU and idle, so ruling those out would help.

Thanks,
Mark.

>
> > The merge commit is 1f2d9ffc7a5f Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.
> > Before, the reboot usually took significantly less than 15 seconds and after merge the reboot time fall in the range of 60-100 seconds.
> > At some point watchdog-like functionality or softdog start to worry about the system stuck somewhere nd panic the system.
> >
> > The delay is found to be in device's ->shutdown() methods called from kernel_restart():
> > void kernel_restart_prepare(char *cmd)
> > {
> > blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> > system_state = SYSTEM_RESTART;
> > usermodehelper_disable();
> > device_shutdown(); <---- here
> > }
> >
> > The driver in question is ufshcd and its ufshcd_wl_shutdown() shutdown method. It often blocks on scsi_device_quiesce() and upon manual checking it seems that it sleeps on blk_mq_freeze_queue_wait()/wait_event() in blk_freeze_queue():
> >
> > scsi_device_quiesce(struct scsi_device *sdev)
> > {
> > ...
> > blk_mq_freeze_queue(q);
> > ...
> > }
> > ||
> > V
> > void blk_freeze_queue(struct request_queue *q)
> > {
> > /*
> > * In the !blk_mq case we are only calling this to kill the
> > * q_usage_counter, otherwise this increases the freeze depth
> > * and waits for it to return to zero. For this reason there is
> > * no blk_unfreeze_queue(), and blk_freeze_queue() is not
> > * exported to drivers as the only user for unfreeze is blk_mq.
> > */
> > blk_freeze_queue_start(q);
> > blk_mq_freeze_queue_wait(q); <--- sleeps on wait_event() here
> > }
> >
> > Or in other words:
> >
> > [ 34.785050][ C4] sysrq: Show Blocked State
> > [ 34.785132][ C4] task:init state:D stack:9680 pid:1 ppid:0 flags:0x04000008
> > [ 34.785301][ C4] Call trace:
> > [ 34.785360][ C4] __switch_to+0x180/0x308
> > [ 34.785452][ C4] __schedule+0x61c/0x9f0
> > [ 34.785530][ C4] schedule+0x84/0xf4
> > [ 34.785602][ C4] blk_mq_freeze_queue_wait+0x78/0xbc
> > [ 34.785707][ C4] blk_freeze_queue+0x74/0x8c
> > [ 34.785850][ C4] blk_mq_freeze_queue+0x18/0x2c
> > [ 34.786033][ C4] scsi_device_quiesce+0x54/0xec
> > [ 34.786216][ C4] ufshcd_wl_shutdown+0x98/0xc0
> > [ 34.786396][ C4] device_shutdown+0x1a8/0x264
> > [ 34.786572][ C4] kernel_restart+0x48/0x11c
> > [ 34.786742][ C4] __arm64_sys_reboot+0x1a8/0x27c
> > [ 34.786927][ C4] invoke_syscall+0x60/0x130
> > [ 34.787096][ C4] el0_svc_common+0xbc/0x100
> > [ 34.787266][ C4] do_el0_svc+0x38/0xc4
> > [ 34.787420][ C4] el0_svc+0x34/0xc4
> > [ 34.787563][ C4] el0t_64_sync_handler+0x8c/0xfc
> > [ 34.787749][ C4] el0t_64_sync+0x1a4/0x1a8
> >
> >
> > However, bisect pointed to 0c5ffc3d7b15 cpuidle, dt: Push RCU-idle into driver
> >
> > ->BAD 924aed1646bf cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()
> > ->BAD a01353cf1896 cpuidle: Fix ct_idle_*() usage
> > ->BAD (doesn't compile, needs one missing header file) 0c5ffc3d7b15 cpuidle, dt: Push RCU-idle into driver
> > ->good c3d42418dca5 cpuidle, OMAP4: Push RCU-idle into driver
> >
> > Looks like adding CPUIDLE_FLAG_RCU_IDLE flag to idle driver caused this behaviour.
> > The minimal change that is required for this system to avoid the regression
> > would be one liner that removes the flag (below).
> >
> > But if it is a real regression, then other idle drivers if used will likely
> > cause this regression too withe same ufshcd driver. There is also a suspicion
> > that CPUIDLE_FLAG_RCU_IDLE just revealed or uncovered some other problem.
> >
> > Any thoughts on this? Some missing __cpuidle or noinstr annotations?
>
> As Peter has suggested in another reply, this might not be an *upstream*
> regression, but rather an interaction with those out-of-tree patches. If you
> can provide a pointer to the branch above that, it'll make it much easier to
> figure out what's going on.
>
> The code in mainline is unfortunately convoluted, but it doesn't look to be
> obviously wrong.
>
> FWIW, I've just tried v6.3-rc3 on Juno R2 in a few configurations (ACPI, DT,
> with PROVE_LOCKING=y and DEBUG_LOCKDEP=y, with PSCI_CHECKER=y) and I'm not
> seeing similar issues. So either I'm not tickling the same code path, or
> something else is involved.
>
> Thanks,
> Mark.