oops due to smp_call_function_single changes

From: Avi Kivity
Date: Sun Aug 24 2008 - 12:41:30 EST


My 2s x 2c Intel server (Xeon 5150) won't boot anymore. I bisected this to

commit cc7a486cac78f6fc1a24e8cd63036bae8d2ab431
Author: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
Date: Mon Aug 11 13:49:30 2008 +1000

generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask()
* Venki Pallipadi <venkatesh.pallipadi@xxxxxxxxx> wrote:
> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask() with
> wait parameter.
>
[...]
Nice debugging work.
I'd suggest something like the attached (boot tested) patch as the simple
fix for now.
I expect the benefits from the less synchronized, multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>


Reverting this commit (and cc7a486cac78f6fc1a24e8cd63036bae8d2ab431, which is an add-on fix) allows my guest to boot.

My .config can be found in http://userweb.kernel.org/~avi/scf-oops/config. I have an oops somewhere inside a mobile phone but have yet to find a way to dig it out. Netconsole doesn't work for me built-in for some reason, and this is during boot (I think during the loading of the ahci modules).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/