Full dynticks needs evtdesc set before marking cpu online.

From: Robin Holt
Date: Wed May 08 2013 - 19:57:49 EST


Thomas,

We are seeing failures booting medium sized machines which I think is
a change in expectations that dyntick put on x86's start_secondary.

During boot of cpus, we see an occassional panic in tick_do_broadcast at

195 if (!cpumask_empty(mask)) {
196 /*
197 * It might be necessary to actually check whether the devices
198 * have different broadcast functions. For now, just use the
199 * one of the first device. This works as long as we have this
200 * misfeature only on x86 (lapic)
201 */
202 td = &per_cpu(tick_cpu_device, cpumask_first(mask));
203 td->evtdev->broadcast(mask);
^^^^^^
NULL --------+


This is called from:
211 static void tick_do_periodic_broadcast(void)
212 {
213 raw_spin_lock(&tick_broadcast_lock);
214
215 cpumask_and(tmpmask, cpu_online_mask, tick_broadcast_mask);
216 tick_do_broadcast(tmpmask);


Now the problem. In start_secondary, we have:
272 lock_vector_lock();
273 set_cpu_online(smp_processor_id(), true);
274 unlock_vector_lock();
275 per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
276 x86_platform.nmi_init();
277
278 /* enable local interrupts */
279 local_irq_enable();
280
281 /* to prevent fake stack check failure in clock setup */
282 boot_init_stack_canary();
283
284 x86_cpuinit.setup_percpu_clockev();

So we have the cpu marked online on line 273, but evtdesc is not set
until line 284. This code has been in start_secondary for a considerable
period of time. I think it is just being revealed now.

It does not show up with a normal config, but taking a 'make
x86_64_defconfig' kernel and changing CONFIG_MAXSMP seems to change boot
timing enouogh to make it reproducible on 4 socket and above machines.

The following makes it boot, but I am not sure if this is the right
thing to do.

$ git diff
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9c73b51..8456432 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -264,6 +264,8 @@ notrace static void __cpuinit start_secondary(void *unused)
*/
check_tsc_sync_target();

+ x86_cpuinit.setup_percpu_clockev();
+
/*
* We need to hold vector_lock so there the set of online cpus
* does not change while we are assigning vectors to cpus. Holding
@@ -281,8 +283,6 @@ notrace static void __cpuinit start_secondary(void *unused)
/* to prevent fake stack check failure in clock setup */
boot_init_stack_canary();

- x86_cpuinit.setup_percpu_clockev();
-
wmb();
cpu_startup_entry(CPUHP_ONLINE);
}


Thanks,
Robin Holt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/