[ BUG: Invalid wait context ], BUG: scheduling while atomic: swapper/0/1/0x00000002 on kernel 6.2.0-rc4

From: Erhard F.
Date: Mon Jan 16 2023 - 19:45:31 EST


Getting this at boot on my Talos II POWER9 box:

[...]
=============================
[ BUG: Invalid wait context ]
6.2.0-rc4-P9 #1 Tainted: G T
-----------------------------
swapper/0/1 is trying to lock:
c0000000021b57c8 (cpuhp_state_mutex){+.+.}-{3:3}, at: __cpuhp_setup_state_cpuslocked+0xb0/0x5f0
other info that might help us debug this:
context-{4:4}
3 locks held by swapper/0/1:
#0: c00000000dd738f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x124/0x330
#1: c00000000218ef58 (nest_init_lock){+.+.}-{2:2}, at: init_imc_pmu+0x1104/0x1790
#2: c0000000021b58f0 (cpu_hotplug_lock){++++}-{0:0}, at: init_imc_pmu+0x137c/0x1790
stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.2.0-rc4-P9 #1
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
Call Trace:
[c000000006c67050] [c0000000012a3c80] dump_stack_lvl+0xb4/0x124 (unreliable)
[c000000006c67090] [c0000000001e233c] __lock_acquire+0x351c/0x3550
[c000000006c67200] [c0000000001e33f0] lock_acquire+0x1a0/0x5a0
[c000000006c67300] [c0000000012fcd08] __mutex_lock+0xe8/0x610
[c000000006c673f0] [c0000000000f3680] __cpuhp_setup_state_cpuslocked+0xb0/0x5f0
[c000000006c674b0] [c0000000000f4ef8] __cpuhp_setup_state+0x168/0x3f0
[c000000006c67530] [c0000000000d3e5c] init_imc_pmu+0x137c/0x1790
[c000000006c67690] [c0000000000c4764] opal_imc_counters_probe+0x3a4/0x7e0
[c000000006c677e0] [c000000000dd09c0] platform_probe+0xa0/0x150
[c000000006c67860] [c000000000dcaf20] really_probe+0x170/0x590
[c000000006c67900] [c000000000dcb448] __driver_probe_device+0x108/0x1d0
[c000000006c67940] [c000000000dcb594] driver_probe_device+0x84/0x1a0
[c000000006c67990] [c000000000dcb9c4] __driver_attach+0x134/0x330
[c000000006c679e0] [c000000000dc5e6c] bus_for_each_dev+0xdc/0x150
[c000000006c67a30] [c000000000dc9fc0] driver_attach+0x40/0x70
[c000000006c67a60] [c000000000dc92b8] bus_add_driver+0x338/0x420
[c000000006c67b10] [c000000000dcd8d4] driver_register+0x154/0x310
[c000000006c67ba0] [c000000000dd0144] __platform_driver_register+0x54/0x80
[c000000006c67bd0] [c00000000203183c] opal_imc_driver_init+0x60/0x90
[c000000006c67c00] [c000000000011ee8] do_one_initcall+0xc8/0x630
[c000000006c67cf0] [c0000000020036cc] kernel_init_freeable+0x72c/0x864
[c000000006c67de0] [c000000000012b98] kernel_init+0x28/0x1d0
[c000000006c67e50] [c00000000000ce5c] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at 0x0
NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
REGS: c000000006c67e80 TRAP: 0000 Tainted: G T (6.2.0-rc4-P9)
MSR: 0000000000000000 <> CR: 00000000 XER: 00000000
CFAR: 0000000000000000 IRQMASK: 0
GPR00: 0000000000000000 c000000006c68000 0000000000000000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 0000000000000000 c000000000012b78 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
NIP [0000000000000000] 0x0
LR [0000000000000000] 0x0
--- interrupt: 0
BUG: scheduling while atomic: swapper/0/1/0x00000002
INFO: lockdep is turned off.
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.2.0-rc4-P9 #1
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
Call Trace:
[c000000006c66f90] [c0000000012a3c80] dump_stack_lvl+0xb4/0x124 (unreliable)
[c000000006c66fd0] [c000000000163c68] __schedule_bug+0xf8/0x120
[c000000006c67050] [c0000000012f8c28] __schedule+0x1218/0x14d0
[c000000006c67150] [c0000000012f8fa4] schedule+0xc4/0x200
[c000000006c671d0] [c0000000013052e4] schedule_timeout+0x174/0x1f0
[c000000006c672b0] [c0000000012fa20c] __wait_for_common+0x15c/0x310
[c000000006c67360] [c0000000000f2ce8] cpuhp_issue_call+0x398/0x570
[c000000006c673f0] [c0000000000f3778] __cpuhp_setup_state_cpuslocked+0x1a8/0x5f0
[c000000006c674b0] [c0000000000f4ef8] __cpuhp_setup_state+0x168/0x3f0
[c000000006c67530] [c0000000000d3e5c] init_imc_pmu+0x137c/0x1790
[c000000006c67690] [c0000000000c4764] opal_imc_counters_probe+0x3a4/0x7e0
[c000000006c677e0] [c000000000dd09c0] platform_probe+0xa0/0x150
[c000000006c67860] [c000000000dcaf20] really_probe+0x170/0x590
[c000000006c67900] [c000000000dcb448] __driver_probe_device+0x108/0x1d0
[c000000006c67940] [c000000000dcb594] driver_probe_device+0x84/0x1a0
[c000000006c67990] [c000000000dcb9c4] __driver_attach+0x134/0x330
[c000000006c679e0] [c000000000dc5e6c] bus_for_each_dev+0xdc/0x150
[c000000006c67a30] [c000000000dc9fc0] driver_attach+0x40/0x70
[c000000006c67a60] [c000000000dc92b8] bus_add_driver+0x338/0x420
[c000000006c67b10] [c000000000dcd8d4] driver_register+0x154/0x310
[c000000006c67ba0] [c000000000dd0144] __platform_driver_register+0x54/0x80
[c000000006c67bd0] [c00000000203183c] opal_imc_driver_init+0x60/0x90
[c000000006c67c00] [c000000000011ee8] do_one_initcall+0xc8/0x630
[c000000006c67cf0] [c0000000020036cc] kernel_init_freeable+0x72c/0x864
[c000000006c67de0] [c000000000012b98] kernel_init+0x28/0x1d0
[c000000006c67e50] [c00000000000ce5c] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at 0x0
NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
REGS: c000000006c67e80 TRAP: 0000 Tainted: G T (6.2.0-rc4-P9)
MSR: 0000000000000000 <> CR: 00000000 XER: 00000000
CFAR: 0000000000000000 IRQMASK: 0
GPR00: 0000000000000000 c000000006c68000 0000000000000000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 0000000000000000 c000000000012b78 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
NIP [0000000000000000] 0x0
LR [0000000000000000] 0x0
--- interrupt: 0
BUG: scheduling while atomic: swapper/0/1/0x00000000
INFO: lockdep is turned off.
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W T 6.2.0-rc4-P9 #1
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
Call Trace:
[c000000006c66f90] [c0000000012a3c80] dump_stack_lvl+0xb4/0x124 (unreliable)
[c000000006c66fd0] [c000000000163c68] __schedule_bug+0xf8/0x120
[c000000006c67050] [c0000000012f8c28] __schedule+0x1218/0x14d0
[c000000006c67150] [c0000000012f8fa4] schedule+0xc4/0x200
[c000000006c671d0] [c0000000013052e4] schedule_timeout+0x174/0x1f0
[c000000006c672b0] [c0000000012fa20c] __wait_for_common+0x15c/0x310
[c000000006c67360] [c0000000000f2ce8] cpuhp_issue_call+0x398/0x570
[c000000006c673f0] [c0000000000f3778] __cpuhp_setup_state_cpuslocked+0x1a8/0x5f0
[c000000006c674b0] [c0000000000f4ef8] __cpuhp_setup_state+0x168/0x3f0
[c000000006c67530] [c0000000000d2cd8] init_imc_pmu+0x1f8/0x1790
[c000000006c67690] [c0000000000c4764] opal_imc_counters_probe+0x3a4/0x7e0
[c000000006c677e0] [c000000000dd09c0] platform_probe+0xa0/0x150
[c000000006c67860] [c000000000dcaf20] really_probe+0x170/0x590
[c000000006c67900] [c000000000dcb448] __driver_probe_device+0x108/0x1d0
[c000000006c67940] [c000000000dcb594] driver_probe_device+0x84/0x1a0
[c000000006c67990] [c000000000dcb9c4] __driver_attach+0x134/0x330
[c000000006c679e0] [c000000000dc5e6c] bus_for_each_dev+0xdc/0x150
[c000000006c67a30] [c000000000dc9fc0] driver_attach+0x40/0x70
[c000000006c67a60] [c000000000dc92b8] bus_add_driver+0x338/0x420
[c000000006c67b10] [c000000000dcd8d4] driver_register+0x154/0x310
[c000000006c67ba0] [c000000000dd0144] __platform_driver_register+0x54/0x80
[c000000006c67bd0] [c00000000203183c] opal_imc_driver_init+0x60/0x90
[c000000006c67c00] [c000000000011ee8] do_one_initcall+0xc8/0x630
[c000000006c67cf0] [c0000000020036cc] kernel_init_freeable+0x72c/0x864
[c000000006c67de0] [c000000000012b98] kernel_init+0x28/0x1d0
[c000000006c67e50] [c00000000000ce5c] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at 0x0
NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
REGS: c000000006c67e80 TRAP: 0000 Tainted: G W T (6.2.0-rc4-P9)
MSR: 0000000000000000 <> CR: 00000000 XER: 00000000
CFAR: 0000000000000000 IRQMASK: 0
GPR00: 0000000000000000 c000000006c68000 0000000000000000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 0000000000000000 c000000000012b78 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
NIP [0000000000000000] 0x0
LR [0000000000000000] 0x0
--- interrupt: 0
[...]

Some data about the machine:
# inxi -bZ
System:
Host: T1000 Kernel: 6.2.0-rc4-P9 arch: ppc64 bits: 64 Console: pty pts/0
Distro: Gentoo Base System release 2.9
Machine:
Type: PPC System: T2P9D01 REV 1.01 details: N/A
CPU:
Info: 2x 4-core POWER9 altivec supported [MT MCP SMP] speed (MHz):
avg: 2581 min/max: 2154/3800
Graphics:
Device-1: ASPEED Graphics Family driver: N/A
Device-2: AMD R480 [Radeon X800 GTO] driver: radeon v: kernel
Device-3: N/A driver: N/A
Display: x11 server: X.Org v: 21.1.1 driver: X: loaded: radeon
gpu: radeon resolution: 1440x900~60Hz
OpenGL: renderer: llvmpipe (LLVM 15.0.6 128 bits) v: 4.5 Mesa 22.3.3
Network:
Device-1: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Device-2: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Drives:
Local Storage: total: 447.13 GiB used: 11.63 GiB (2.6%)
Info:
Processes: 399 Uptime: 1m Memory: 54.7 GiB used: 3.05 GiB (5.6%)
Shell: Bash inxi: 3.3.17


The issue is reproducibly, I get it every boot. Kernel .config and full dmesg attached.

Regards,
Erhard

Attachment: dmesg_62-rc4_p9
Description: Binary data

Attachment: config_62-rc4_p9
Description: Binary data