[BUG] The UI hangs when system changes CPU frequency on Raspberry Pi 4B

From: Jian-Hong Pan
Date: Thu May 05 2022 - 03:00:50 EST


Hi,

We found the UI hangs totally when system changes CPU frequency on
Raspberry Pi 4B. The test kernel is mainline kernel 5.18-rc3.

According to the oops messages and the stack:

[ 1330.166158] INFO: task kworker/0:0:6 blocked for more than 120 seconds.
[ 1330.172963] Tainted: G W E 5.18.0-rc3 #61
[ 1330.178767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1330.186742] task:kworker/0:0 state:D stack: 0 pid: 6
ppid: 2 flags:0x00000008
[ 1330.186762] Workqueue: events dbs_work_handler
[ 1330.186779] Call trace:
[ 1330.186782] __switch_to+0xf8/0x150
[ 1330.186793] __schedule+0x37c/0x141c
[ 1330.186800] schedule+0x54/0xc4
[ 1330.186807] schedule_timeout+0xa4/0x190
[ 1330.186813] __wait_for_common+0xc4/0x224
[ 1330.186818] wait_for_completion_timeout+0x2c/0x40
[ 1330.186822] mbox_send_message+0xf0/0x1f4
[ 1330.186828] rpi_firmware_property_list+0xf8/0x280
[ 1330.186834] rpi_firmware_property+0x7c/0xc0
[ 1330.186839] raspberrypi_fw_set_rate+0x64/0xe4 [clk_raspberrypi]
[ 1330.186851] clk_change_rate+0xe4/0x470
[ 1330.186859] clk_core_set_rate_nolock+0x1d0/0x2ec
[ 1330.186864] clk_set_rate+0x44/0x190
[ 1330.186870] _set_opp+0x3b4/0x70c
[ 1330.186876] dev_pm_opp_set_rate+0x11c/0x2d0
[ 1330.186881] set_target+0x3c/0x4c
[ 1330.186887] __cpufreq_driver_target+0x400/0x7ac
[ 1330.186893] od_dbs_update+0xc8/0x1c0
[ 1330.186898] dbs_work_handler+0x4c/0x8c
[ 1330.186903] process_one_work+0x210/0x480
[ 1330.186910] worker_thread+0x180/0x520
[ 1330.186915] kthread+0xe4/0xf0
[ 1330.186920] ret_from_fork+0x10/0x20
[ 1330.186928] INFO: task kworker/u8:0:8 blocked for more than 120 seconds.
[ 1330.193744] Tainted: G W E 5.18.0-rc3 #61
[ 1330.199509] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1330.207467] task:kworker/u8:0 state:D stack: 0 pid: 8
ppid: 2 flags:0x00000008
[ 1330.207485] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 1330.207555] Call trace:
[ 1330.207558] __switch_to+0xf8/0x150
[ 1330.207572] __schedule+0x37c/0x141c
[ 1330.207579] schedule+0x54/0xc4
[ 1330.207586] schedule_preempt_disabled+0x1c/0x2c
[ 1330.207593] __mutex_lock.constprop.0+0x160/0x59c
[ 1330.207598] __mutex_lock_slowpath+0x40/0x8c
[ 1330.207602] mutex_lock+0x68/0x80
[ 1330.207607] clk_prepare_lock+0x4c/0xac
[ 1330.207616] clk_set_rate_range.part.0+0x3c/0x250
[ 1330.207622] clk_set_min_rate+0x3c/0xdc
[ 1330.207627] vc4_atomic_commit_tail+0x68c/0x77c [vc4]
[ 1330.207664] commit_tail+0xd8/0x210 [drm_kms_helper]
[ 1330.207707] commit_work+0x20/0x30 [drm_kms_helper]
[ 1330.207748] process_one_work+0x210/0x480
[ 1330.207755] worker_thread+0x180/0x520
[ 1330.207760] kthread+0xe4/0xf0
[ 1330.207765] ret_from_fork+0x10/0x20

Looks like clk_raspberrypi tries to set/change rate by invoking
raspberrypi_fw_set_rate(), but the transaction with firmware is
timeout. There might be a lock shared with others, but held by it. For
example, the second oops message from drm_kms_helper -> vc4 shows
clk_prepare_lock() is waiting to be scheduled.

The UI hangs until raspberrypi-firmware gets the failed returned error
from mbox_send_message().

[ 4887.601985] raspberrypi-firmware soc:firmware: mbox_send_message returned -62
[ 4887.609269] raspberrypi-clk soc:firmware:clocks: Failed to change
fw-clk-arm frequency: -62

The serial console is alive when the UI hangs.

I put all of the findings on https://bugzilla.kernel.org/show_bug.cgi?id=215875

Jian-Hong Pan