[PATCH] IRQ, cpu-hotplug: Fix a race between CPU hotplug and IRQ desc alloc/free
From: Huang, Ying
Date:  Mon Sep 04 2017 - 04:53:21 EST
From: Huang Ying <ying.huang@xxxxxxxxx>
When developing code to bootup some APs (Application CPUs)
asynchronously, the following kernel panic is encountered.  After
checking the code, it is found that the IRQ descriptor may be NULL
during CPU hotplug.  So I added corresponding NULL pointer checking to
fix this.  And it is found that irq_migrate_all_off_this_cpu() doesn't
lock sparse_irq_lock, fixed that too.
"
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a4
IP: _raw_spin_lock_irq+0x1e/0x40
PGD 0
P4D 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 93 PID: 713 Comm: cpuhp/93 Not tainted 4.13.0-rc7-00261-g3760d3d #1
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0335.R00.1601291644 01/29/2016
task: ffff883f930e2680 task.stack: ffffc9000ef00000
RIP: 0010:_raw_spin_lock_irq+0x1e/0x40
RSP: 0000:ffffc9000ef03de0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000010
RDX: 0000000000000001 RSI: 0000000000000010 RDI: 00000000000000a4
RBP: ffffc9000ef03de0 R08: ffff881036801240 R09: 0000000000000000
R10: 0000000000000040 R11: ffff881036801268 R12: 00000000000000a4
R13: 0000000000000000 R14: 000000000000005d R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff884044540000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a4 CR3: 000000407ee09000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 irq_affinity_online_cpu+0x46/0xe0
 ? irq_migrate_all_off_this_cpu+0x2a0/0x2a0
 cpuhp_invoke_callback+0x80/0x400
 cpuhp_up_callbacks+0x36/0xc0
 ? smpboot_thread_fn+0x34/0x1f0
 ? smpboot_thread_fn+0x12d/0x1f0
 cpuhp_thread_fun+0xd5/0xe0
 smpboot_thread_fn+0x128/0x1f0
 kthread+0x114/0x150
 ? sort_range+0x30/0x30
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x25/0x30
Code: 89 e5 e8 26 8b 6f ff 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 fa 66 0f 1f 44 00 00 65 ff 05 49 4a 63 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 02 5d c3 89 c6 e8 21 71 6f ff 66 90 5d c3
RIP: _raw_spin_lock_irq+0x1e/0x40 RSP: ffffc9000ef03de0
CR2: 00000000000000a4
---[ end trace a9eacc0758f1f81e ]---
Kernel panic - not syncing: Fatal exception
"
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
---
 kernel/irq/cpuhotplug.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 638eb9c83d9f..af9029625271 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -129,10 +129,13 @@ void irq_migrate_all_off_this_cpu(void)
 	struct irq_desc *desc;
 	unsigned int irq;
 
+	irq_lock_sparse();
 	for_each_active_irq(irq) {
 		bool affinity_broken;
 
 		desc = irq_to_desc(irq);
+		if (!desc)
+			continue;
 		raw_spin_lock(&desc->lock);
 		affinity_broken = migrate_one_irq(desc);
 		raw_spin_unlock(&desc->lock);
@@ -142,6 +145,7 @@ void irq_migrate_all_off_this_cpu(void)
 					    irq, smp_processor_id());
 		}
 	}
+	irq_unlock_sparse();
 }
 
 static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
@@ -179,6 +183,8 @@ int irq_affinity_online_cpu(unsigned int cpu)
 	irq_lock_sparse();
 	for_each_active_irq(irq) {
 		desc = irq_to_desc(irq);
+		if (!desc)
+			continue;
 		raw_spin_lock_irq(&desc->lock);
 		irq_restore_affinity_of_irq(desc, cpu);
 		raw_spin_unlock_irq(&desc->lock);
-- 
2.13.2