Re: i386 cpu hotplug bug - instant reboot when onlining secondary

From: Nathan Lynch
Date: Mon Feb 27 2006 - 02:42:20 EST


Zwane Mwaikambo wrote:
> On Sun, 19 Feb 2006, Nathan Lynch wrote:
>
> > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > box instantly reboot. I've been seeing this throughout the 2.6.16-rc
> > series, but wasn't able to collect more information until now. Not
> > sure when this last worked, unfortunately.
> >
> > With the debugging patch below, I get this on serial console:
>
> Does 2.6.14 work? Also i wonder if it gets out of the trampoline...

2.6.14 works (albeit with an APIC error reported). When retesting
2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
"startup_secondary" line:

[17179709.100000] CPU 1 is now offline
[17179714.636000] Booting processor 1/1 eip 3000
[17179714.688000] CPU 1 irqstacks, hard=7837f000 soft=78377000
[17179714.756000] Setting warm reset code and vector.
[17179714.812000] 1.
[17179714.836000] 2.
[17179714.860000] 3.
[17179714.880000] Asserting INIT.
[17179714.916000] Waiting for send to finish...
[17179714.968000] +<7>Deasserting INIT.
[17179715.024000] Waiting for send to finish...
[17179715.072000] +<7>#startup loops: 2.
[17179715.116000] Sending STARTUP #1.
[17179715.160000] After apic_write.
[17179715.196000] Doing apic_write_around for target chip...
[17179715.260000] Doing apic_write_around to kick the second...

>
> Index: linux-2.6.16-rc2/arch/i386/kernel/smpboot.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2/arch/i386/kernel/smpboot.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 smpboot.c
> --- linux-2.6.16-rc2/arch/i386/kernel/smpboot.c 11 Feb 2006 18:55:06 -0000 1.1.1.1
> +++ linux-2.6.16-rc2/arch/i386/kernel/smpboot.c 21 Feb 2006 16:19:22 -0000
> @@ -514,6 +514,7 @@ static void __devinit start_secondary(vo
> cpu_init();
> preempt_disable();
> smp_callin();
> + Dprintk("startup_secondary\n");
> while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
> rep_nop();
> setup_secondary_APIC_clock();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/