[BUG] do_IRQ: 7.33 No irq handler for vector

From: jianchao.wang
Date: Fri Jan 19 2018 - 09:22:41 EST



Hi Thomas

When I did cpu hotplug stress test, I found this log on my machine.

[ 267.161043] do_IRQ: 7.33 No irq handler for vector

I add a dump_stack below the bug and get following log:

[ 267.161043] do_IRQ: 7.33 No irq handler for vector
[ 267.161045] CPU: 7 PID: 52 Comm: migration/7 Not tainted 4.15.0-rc7+ #27
[ 267.161045] Hardware name: LENOVO 10MLS0E339/3106, BIOS M1AKT22A 06/27/2017
[ 267.161046] Call Trace:
[ 267.161047] <IRQ>
[ 267.161052] dump_stack+0x7c/0xb5
[ 267.161054] do_IRQ+0xb9/0xf0
[ 267.161056] common_interrupt+0xa2/0xa2
[ 267.161057] </IRQ>
[ 267.161059] RIP: 0010:multi_cpu_stop+0xb0/0x120
[ 267.161060] RSP: 0018:ffffbb6c81af7e70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde
[ 267.161061] RAX: 0000000000000001 RBX: 0000000000000004 RCX: 0000000000000000
[ 267.161062] RDX: 0000000000000006 RSI: ffffffff898c4591 RDI: 0000000000000202
[ 267.161063] RBP: ffffbb6c826e7c88 R08: ffff991abc1256bc R09: 0000000000000005
[ 267.161063] R10: ffffbb6c81af7db8 R11: ffffffff89c91d20 R12: 0000000000000001
[ 267.161064] R13: ffffbb6c826e7cac R14: 0000000000000003 R15: 0000000000000000
[ 267.161067] ? cpu_stop_queue_work+0x90/0x90
[ 267.161068] cpu_stopper_thread+0x83/0x100
[ 267.161070] smpboot_thread_fn+0x161/0x220
[ 267.161072] kthread+0xf5/0x130
[ 267.161073] ? sort_range+0x20/0x20
[ 267.161074] ? kthread_associate_blkcg+0xe0/0xe0
[ 267.161076] ret_from_fork+0x24/0x30

The irq just occurred after the irq is enabled in multi_cpu_stop.

0xffffffff8112d655 is in multi_cpu_stop (/home/will/u04/source_code/linux-block/kernel/stop_machine.c:223).
218 */
219 touch_nmi_watchdog();
220 }
221 } while (curstate != MULTI_STOP_EXIT);
222
223 local_irq_restore(flags);
224 return err;
225 }

The vector 33 here is used by a NVMe card.

124: 616993 0 0 0 0 0 0 0 IR-PCI-MSI 1048576-edge nvme0q0, nvme0q1
125: 44 0 0 0 0 0 0 0 IR-PCI-MSI 327680-edge xhci_hcd
126: 0 620871 0 0 0 0 0 0 IR-PCI-MSI 1048577-edge nvme0q2
127: 0 0 641541 0 0 0 0 0 IR-PCI-MSI 1048578-edge nvme0q3
128: 0 0 0 577836 0 0 0 0 IR-PCI-MSI 1048579-edge nvme0q4
129: 0 0 0 0 554206 0 0 0 IR-PCI-MSI 1048580-edge nvme0q5
130: 0 0 0 0 0 455021 0 0 IR-PCI-MSI 1048581-edge nvme0q6
131: 0 0 0 0 0 0 273111 0 IR-PCI-MSI 1048582-edge nvme0q7
132: 0 0 0 0 0 0 0 120987 IR-PCI-MSI 1048583-edge nvme0q8

Here is the output of irq debugfs

handler: handle_edge_irq
device: 0000:02:00.0
status: 0x00004000
istate: 0x00000000
ddepth: 0
wdepth: 0
dstate: 0x01608200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_MANAGED
node: 0
affinity: 7
effectiv: 7
pending:
domain: INTEL-IR-MSI-1-2
hwirq: 0x100007
chip: IR-PCI-MSI
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: INTEL-IR-1
hwirq: 0x1a0000
chip: INTEL-IR
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x84
chip: APIC
flags: 0x0
Vector: 33
Target: 7

Thanks
Jianchao