Re: Seeing "huh, entered softirq 8 ffffffff802682aa preempt_count00000100, exited with 00010100?" in tip.git

From: Jeremy Fitzhardinge
Date: Fri Jan 30 2009 - 11:41:33 EST


Ingo Molnar wrote:
* Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote:

Paul E. McKenney wrote:
On Thu, Jan 29, 2009 at 02:12:31PM -0800, Jeremy Fitzhardinge wrote:
When I boot my x86-64 tip.git kernel under Xen, I'm seeing:

pnp: PnP ACPI: disabled
NET: Registered protocol family 2
huh, entered softirq 8 ffffffff802682aa preempt_count 00000100, exited with 00010100?
IP route cache hash table entries: 512 (order: 0, 4096 bytes)
TCP established hash table entries: 1024 (order: 2, 16384 bytes)
[...]
XENBUS: Device with no driver: device/console/0
Freeing unused kernel memory: 372k freed
Red Hat nash version 6.0.19 starting
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
BUG: scheduling while atomic: swapper/0/0x00010000
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.29-rc3-tip #481
Call Trace:
[<ffffffff80238c1f>] __schedule_bug+0x62/0x66
[<ffffffff80211d2d>] ? retint_restore_args+0x5/0x20
[<ffffffff80503921>] __schedule+0x95/0x792
[<ffffffff802093aa>] ? _stext+0x3aa/0x1000
[<ffffffff802093aa>] ? _stext+0x3aa/0x1000
[<ffffffff805040c2>] schedule+0xe/0x22
[<ffffffff8020ff04>] cpu_idle+0x70/0x72
[<ffffffff804fc3a0>] cpu_bringup_and_idle+0x13/0x15
Creating initial device nodes
Setting up hotplug.


From what I can see, softirq 8 is the RCU softirq. I don't know if the "scheduling while atomic" is related or not, but its two new schedulerish symptoms appearing at once, so I think its likely they're related.
Hmmm... Mysterious, as you seem to be using classic RCU, which hasn't
changed in awhile. Which branch of the tip tree are you using?
tip/master. It looks like this appeared since -rc1. Mu current suspicion is the percpu changes, since I'm seeing some other strange symptoms.

Cc:-ed more folks - it's either the percpu changes or the APIC changes (both occured at about the same time). Or maybe something from upstream.

In my case I'm only seeing it when running under Xen, so I don't think this particular problem is apic related. My current theory - dreamed up last night and completely untested/uninvestigated - is that interrupt masking isn't working. In Xen, interrupts are masked via a per-cpu piece of memory shared with the hypervisor, so if the percpu setup is wrong in some way, the kernel and Xen could be using different memory for the mask, with hilarious results.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/