From: Jeremy Fitzhardinge
Date: Fri Jan 30 2009 - 19:48:29 EST

Ingo Molnar wrote:
* Ingo Molnar <mingo@xxxxxxx> wrote:

* Ingo Molnar <mingo@xxxxxxx> wrote:

Call Trace:
[<ffffffff80238c1f>] __schedule_bug+0x62/0x66
[<ffffffff80211d2d>] ? retint_restore_args+0x5/0x20
[<ffffffff80503921>] __schedule+0x95/0x792
[<ffffffff802093aa>] ? _stext+0x3aa/0x1000
[<ffffffff802093aa>] ? _stext+0x3aa/0x1000
[<ffffffff805040c2>] schedule+0xe/0x22
[<ffffffff8020ff04>] cpu_idle+0x70/0x72
[<ffffffff804fc3a0>] cpu_bringup_and_idle+0x13/0x15
Creating initial device nodes
Setting up hotplug.

From what I can see, softirq 8 is the RCU softirq. I don't know if the "scheduling while atomic" is related or not, but its two new schedulerish symptoms appearing at once, so I think its likely they're related.
Hmmm... Mysterious, as you seem to be using classic RCU, which hasn't
changed in awhile. Which branch of the tip tree are you using?
tip/master. It looks like this appeared since -rc1. Mu current suspicion is the percpu changes, since I'm seeing some other strange symptoms.
Cc:-ed more folks - it's either the percpu changes or the APIC changes (both occured at about the same time). Or maybe something from upstream.
managed to bisect one of the boot crashes i've been seeing:

a698c823e15149941b0f0281527d0c0d1daf2639 is first bad commit
commit a698c823e15149941b0f0281527d0c0d1daf2639
Author: Tejun Heo <tj@xxxxxxxxxx>
Date: Tue Jan 13 20:41:35 2009 +0900

x86: make use PERCPU() macro
Make use the generic PERCPU() macro instead of open
coding it. This will ease future changes.
Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

# bad: [b16884e8] Merge branch 'x86/urgent'
# good: [f2257b70] Merge git://
# good: [345fa66b] Merge branch 'core/locking'
# bad: [f534caca] Merge branch 'oprofile'
# bad: [3eb3963f] Merge branch 'cpus4096' into core/percpu
# bad: [1b437c8c] x86-64: Move irq stats from PDA to per-cpu and con
# good: [54da5b3d] x86: fix broken flush_tlb_others_ipi(), fix
# good: [c2c21745] x86: replacing mp_config_intsrc with mpc_intsrc
# bad: [c8f3329a] x86: use static _cpu_pda array
# good: [7de6883f] x86: fix pda_to_op()
# bad: [a698c823] x86: make use PERCPU() macro
# good: [c90aa894] x86: cleanup early setup_percpu references

testing the revert now.

This might be similar to the other 32-bit linker bug that was tracked down yesterday and reverted - maybe that revert unearthed a problem with this commit?

seems to do the trick.

Tejun, a detail, this config has:


Have you considered 32-bit relocatable kernels too? Config attached.

I found my bug. Turns out all my CPUs were sharing the same kernel stack (!), which means it was working surprisingly well, considering...

