System hang with latest kernel v6.16.0-rc1 (rc2 & rc3)
From: Himanshu Madhani
Date: Thu Jul 03 2025 - 14:28:18 EST
Hi Folks,
We are seeing kernel hang while booting after new 6.16-rc1 kernel is installed.
Here’s stack track that shows up
[ 297.656683] systemd-shutdown[1]: Rebooting with kexec.
[ 513.790993] INFO: task kexec:19038 blocked for more than 122 seconds.
[ 513.868087] Not tainted 6.16.0-rc1.master.20250611.ol9.x86_64 #1
[ 513.946210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 514.039923] task:kexec state:D stack:0 pid:19038 tgid:19038 ppid:1 task_flags:0x400100 flags:0x00004002
[ 514.172122] Call Trace:
[ 514.201356] <TASK>
[ 514.226438] __schedule+0x2d1/0x730
[ 514.268161] schedule+0x27/0x80
[ 514.305717] schedule_preempt_disabled+0x15/0x30
[ 514.360954] __mutex_lock.constprop.0+0x4be/0x8a0
[ 514.417232] msi_domain_get_virq+0xcc/0x110
[ 514.467279] pci_msix_write_tph_tag+0x3c/0x100
[ 514.520441] pcie_tph_set_st_entry+0x125/0x1d0
[ 514.573605] bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
[ 514.639258] irq_set_affinity_notifier+0xdd/0x130
[ 514.695534] bnxt_free_irq+0x6e/0x110 [bnxt_en]
[ 514.749746] __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
[ 514.815404] bnxt_close+0x3a/0x100 [bnxt_en]
[ 514.866498] __dev_close_many+0xab/0x220
[ 514.913423] __dev_change_flags+0x102/0x240
[ 514.963464] netif_change_flags+0x26/0x70
[ 515.011424] dev_change_flags+0x40/0xc0
[ 515.057304] devinet_ioctl+0x3aa/0x7a0
[ 515.102142] inet_ioctl+0x1d3/0x1f0
[ 515.143863] sock_do_ioctl+0x7a/0x140
[ 515.187667] __x64_sys_ioctl+0x9b/0x100
[ 515.233545] ? syscall_trace_enter+0x10c/0x1d0
[ 515.286704] do_syscall_64+0x84/0x940
[ 515.330502] ? refill_obj_stock+0x143/0x240
[ 515.380543] ? __dentry_kill+0x12e/0x190
[ 515.427459] ? __memcg_slab_free_hook+0xf4/0x150
[ 515.482698] ? __x64_sys_close+0x3d/0x80
[ 515.529616] ? kmem_cache_free+0x3fe/0x460
[ 515.578614] ? syscall_exit_work+0x118/0x150
[ 515.629695] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[ 515.698453] ? do_syscall_64+0xba/0x940
[ 515.744330] ? mod_memcg_lruvec_state+0x1a2/0x1f0
[ 515.800608] ? __lruvec_stat_mod_folio+0x83/0xd0
[ 515.855843] ? __folio_mod_stat+0x26/0x80
[ 515.903801] ? set_ptes.isra.0+0x36/0x90
[ 515.950723] ? do_anonymous_page+0x103/0x4b0
[ 516.001802] ? __handle_mm_fault+0x394/0x6f0
[ 516.052886] ? count_memcg_events+0x15a/0x1a0
[ 516.105008] ? handle_mm_fault+0x24a/0x350
[ 516.154003] ? do_user_addr_fault+0x221/0x690
[ 516.206122] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[ 516.274887] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 516.335330] RIP: 0033:0x7fc96e903bcb
[ 516.378086] RSP: 002b:00007ffcc7f78518 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 516.468683] RAX: ffffffffffffffda RBX: 000055dc432d8f80 RCX: 00007fc96e903bcb
[ 516.554080] RDX: 00007ffcc7f78680 RSI: 0000000000008914 RDI: 0000000000000003
[ 516.639482] RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000007
[ 516.724882] R10: 000000000000005e R11: 0000000000000202 R12: 000055dc095468dd
[ 516.810278] R13: 000055dc095468e4 R14: 00007ffcc7f78680 R15: 000055dc432d9020
[ 516.895676] </TASK>
[ 516.921808] INFO: task kexec:19038 is blocked on a mutex likely owned by task kexec:19038.
[ 517.020728] task:kexec state:D stack:0 pid:19038 tgid:19038 ppid:1 task_flags:0x400100 flags:0x00004002
Git-bisect point to this merge commit
commit 6376c0770656f3bdf7f411faf068371b6932aeca
Merge: 5e8bbb2caa4e 29857e6f4e30
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue May 27 09:01:26 2025 -0700
Merge tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Thomas Gleixner:
"Updates for clocksource/clockevent drivers:
- The final conversion of text formatted device tree binding to
schemas
- A new driver fot the System Timer Module on S32G NXP SoCs
- A new driver fot the Econet HPT timer
- The usual improvements and device tree binding updates"
* tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
clocksource/drivers/renesas-ostm: Unconditionally enable reprobe support
dt-bindings: timer: renesas,ostm: Document RZ/V2N (R9A09G056) support
dt-bindings: timer: Convert marvell,armada-370-timer to DT schema
dt-bindings: timer: Convert ti,keystone-timer to DT schema
dt-bindings: timer: Convert st,spear-timer to DT schema
dt-bindings: timer: Convert socionext,milbeaut-timer to DT schema
dt-bindings: timer: Convert snps,arc-timer to DT schema
dt-bindings: timer: Convert snps,archs-rtc to DT schema
dt-bindings: timer: Convert snps,archs-gfrc to DT schema
dt-bindings: timer: Convert lsi,zevio-timer to DT schema
dt-bindings: timer: Convert jcore,pit to DT schema
dt-bindings: timer: Convert img,pistachio-gptimer to DT schema
dt-bindings: timer: Convert ezchip,nps400-timer to DT schema
dt-bindings: timer: Convert cirrus,clps711x-timer to DT schema
dt-bindings: timer: Convert altr,timer-1.0 to DT schema
dt-bindings: timer: Add ESWIN EIC7700 CLINT
clocksource/drivers: Add EcoNet Timer HPT driver
dt-bindings: timer: Add EcoNet EN751221 "HPT" CPU Timer
dt-bindings: timer: Convert arm,mps2-timer to DT schema
dt-bindings: timer: Add Sophgo SG2044 ACLINT timer
…
Following further in this commit, I only see this following series that had changes which may or may not be related to hang.
https://lore.kernel.org/all/20250429065337.117370076@xxxxxxxxxxxxx/
I am not very familiar with this subsystem and was hoping if somebody can spot the offending commit and possibly provide fix for this hang.
Note that we tried with rc3 as well to see if there was fix applied in later RC and still see same issue.
[ 525.390801] INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
[ 525.474133] Tainted: G S 6.16.0-rc3.master.20250625.ol9.x86_64 #1
[ 525.570969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 525.664681] task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00004002
[ 525.796878] Call Trace:
[ 525.826116] <TASK>
[ 525.851195] __schedule+0x2d1/0x730
[ 525.892917] schedule+0x27/0x80
[ 525.930478] schedule_preempt_disabled+0x15/0x30
[ 525.985718] __mutex_lock.constprop.0+0x4be/0x8a0
[ 526.041993] msi_domain_get_virq+0xcc/0x110
[ 526.092031] pci_msix_write_tph_tag+0x3c/0x100
[ 526.145186] pcie_tph_set_st_entry+0x125/0x1d0
[ 526.198346] bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
[ 526.264015] irq_set_affinity_notifier+0xe0/0x130
[ 526.320291] bnxt_free_irq+0x6e/0x110 [bnxt_en]
[ 526.374507] __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
[ 526.440175] bnxt_close+0x3a/0x100 [bnxt_en]
[ 526.491264] __dev_close_many+0xae/0x220
[ 526.538179] dev_close_many+0xc2/0x1b0
[ 526.583014] netif_close+0x9d/0xd0
[ 526.623693] bnxt_shutdown+0xb1/0xe0 [bnxt_en]
[ 526.676874] pci_device_shutdown+0x35/0x70
[ 526.725871] device_shutdown+0x118/0x1a0
[ 526.772788] kernel_restart+0x3a/0x70
[ 526.816588] __do_sys_reboot+0x150/0x250
[ 526.863504] do_syscall_64+0x84/0x940
[ 526.907300] ? __put_user_8+0xd/0x20
[ 526.950059] ? rseq_ip_fixup+0x90/0x1e0
[ 526.995937] ? task_mm_cid_work+0x1ad/0x220
[ 527.045971] ? __rseq_handle_notify_resume+0x35/0x90
[ 527.105367] ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
[ 527.175166] ? do_syscall_64+0xba/0x940
[ 527.221040] ? do_filp_open+0xd7/0x1a0
[ 527.265882] ? alloc_fd+0xba/0x110
[ 527.306556] ? do_sys_openat2+0xa4/0xf0
[ 527.352434] ? __x64_sys_openat+0x54/0xb0
[ 527.400389] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[ 527.469150] ? do_syscall_64+0xba/0x940
[ 527.515023] ? do_user_addr_fault+0x221/0x690
[ 527.567141] ? clear_bhb_loop+0x30/0x80
[ 527.613017] ? clear_bhb_loop+0x30/0x80
[ 527.658895] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 527.719332] RIP: 0033:0x7fc3ec504777
[ 527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[ 527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
[ 527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
[ 528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
[ 528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
[ 528.279672] </TASK>
--
Himanshu Madhani Oracle Linux Engineering