Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

From: Pavel Begunkov
Date: Fri Aug 15 2025 - 06:49:06 EST


On 8/15/25 01:23, Jakub Kicinski wrote:
On Thu, 14 Aug 2025 03:16:11 -0700 Breno Leitao wrote:
2.2) netpoll // net poll will call the network subsystem to send the packet
2.3) lock(&fq->lock); // Try to get the lock while the lock was already held

The report for reference:

https://lore.kernel.org/all/fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@xxxxxx/>
Where does netpoll take fq->lock ?

the dependencies between the lock to be acquired
[ 107.985514] and HARDIRQ-irq-unsafe lock:
[ 107.985531] -> (&fq->lock){+.-.}-{3:3} {
...
[ 107.988053] ... acquired at:
[ 107.988054] check_prev_add+0xfb/0xca0
[ 107.988058] validate_chain+0x48c/0x530
[ 107.988061] __lock_acquire+0x550/0xbc0
[ 107.988064] lock_acquire.part.0+0xa1/0x210
[ 107.988068] _raw_spin_lock_bh+0x38/0x50
[ 107.988070] ieee80211_queue_skb+0xfd/0x350 [mac80211]
[ 107.988198] __ieee80211_xmit_fast+0x202/0x360 [mac80211]
[ 107.988314] ieee80211_xmit_fast+0xfb/0x1f0 [mac80211]
[ 107.988424] __ieee80211_subif_start_xmit+0x14e/0x3d0 [mac80211]
[ 107.988530] ieee80211_subif_start_xmit+0x46/0x230 [mac80211]
[ 107.988634] netpoll_start_xmit+0x8b/0xd0
[ 107.988638] __netpoll_send_skb+0x329/0x3b0
[ 107.988641] write_msg+0x104/0x120 [netconsole]
[ 107.988647] console_emit_next_record+0x203/0x250
[ 107.988652] console_flush_all+0x24d/0x370
[ 107.988657] console_unlock+0x66/0x130
[ 107.988662] vprintk_emit+0x142/0x360
[ 107.988666] _printk+0x5b/0x80
[ 107.988671] enabled_store.cold+0x7e/0x83 [netconsole]
[ 107.988677] configfs_write_iter+0xbd/0x120 [configfs]
[ 107.988683] vfs_write+0x213/0x520
[ 107.988689] ksys_write+0x69/0xe0
[ 107.988691] do_syscall_64+0x94/0xa10
[ 107.988695] entry_SYSCALL_64_after_hwframe+0x4b/0x53

We started hitting this a lot in the CI as well, lockdep must have
gotten more sensitive in 6.17. Last I checked lockdep didn't understand

FWIW, I remember there were similar reports last year but with
xmit lock.

that we manually test for nesting with netif_local_xmit_active().

Looks like Breno tried to simplify it, the original syz report
gave the following scenario:

[ 107.984942] Chain exists of:
console_owner --> target_list_lock --> &fq->lock

[ 107.984947] Possible interrupt unsafe locking scenario:
[ 107.984948] CPU0 CPU1
[ 107.984949] ---- ----
[ 107.984950] lock(&fq->lock);
[ 107.984952] local_irq_disable();
[ 107.984952] lock(console_owner);
[ 107.984954] lock(target_list_lock);
[ 107.984956] <Interrupt>
[ 107.984957] lock(console_owner);


Seems like with the fq->lock trace I pasted above we can get sth like:

CPU0 CPU1
---- ----
lock(&fq->lock);
local_irq_disable();
lock(console_owner);
lock(target_list_lock);
lock(&fq->lock);
<Interrupt>
lock(console_owner);

Nesting checks won't help with this one.

--
Pavel Begunkov