Re: Regression introduced with 14e568e78f6f80ca1e27256641ddf524c7dbdc51(stop_machine: Use smpboot threads)

From: Thomas Gleixner
Date: Tue Feb 26 2013 - 07:36:44 EST


On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
>
> I don't know if this is b/c the Xen code is missing something or
> expects something that never happend. I hadn't looked at your
> patch in any detail (was going to do that on Monday).
>
> Either way, if I boot a HVM guest with PV extensions (aka PVHVM)
> this is I what get:
> [ 0.133081] cpu 1 spinlock event irq 71
> [ 0.134049] smpboot: Booting Node 0, Processors #1[ 0.008000] installing Xen timer for CPU 1
> [ 0.205154] Brought up 2 CPUs
> [ 0.205156] smpboot: Total of 2 processors activated (16021.74 BogoMIPS)
>
> [ 28.134000] BUG: soft lockup - CPU#0 stuck for 23s! [migration/0:8]
> [ 28.134000] Modules linked in:
> [ 28.134000] CPU 0
> [ 28.134000] Pid: 8, comm: migration/0 Tainted: G W 3.8.0upstream-06472-g6661875-dirty #1 Xen HVM domU
> [ 28.134000] RIP: 0010:[<ffffffff8110711b>] [<ffffffff8110711b>] stop_machine_cpu_stop+0x7b/0xf0

So the migration thread loops in stop_machine_cpu_stop(). Now the
interesting question is what work was scheduled for that cpu.

The main difference between the old code and the new one, is that the
thread is created earlier and not detroyed on cpu offline.

Could you add some instrumentation, so we can see what kind of cpu
stop work is scheduled and from where?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/