stop_machine lockup issue in 3.9.y.

From: Ben Greear
Date: Wed Jun 05 2013 - 15:31:19 EST


This is no longer really about the module unlink, so changing
subject.

On 06/05/2013 12:11 PM, Ben Greear wrote:
On 06/05/2013 11:48 AM, Tejun Heo wrote:
Hello, Ben.

On Wed, Jun 05, 2013 at 09:59:00AM -0700, Ben Greear wrote:
One pattern I notice repeating for at least most of the hangs is that all but one
CPU thread has irqs disabled and is in state 2. But, there will be one thread
in state 1 that still has IRQs enabled and it is reported to be in soft-lockup
instead of hard-lockup. In 'sysrq l' it always shows some IRQ processing,
but typically that of the sysrq itself. I added printk that would always
print if the thread notices that smdata->state != curstate, and the soft-lockup
thread (cpu 2 below) never shows that message.

It sounds like one of the cpus get live-locked by IRQs. I can't tell
why the situation is made worse by other CPUs being tied up. Do you
ever see CPUs being live locked by IRQs during normal operation?

No, I have not noticed any live locks aside from this, at least in
the 3.9 kernels.

I thought it might be because it was reading stale smdata->state, so I changed
that to atomic_t hoping that would mitigate that. I also tried adding smp_rmb()
below the cpu_relax(). Neither had any affect, so I am left assuming that the

I looked at the code again and the memory accesses seem properly
interlocked. It's a bit tricky and should probably have used spinlock
instead considering it's already a hugely expensive path anyway, but
it does seem correct to me.

thread instead is stuck handling IRQs and never gets out of the IRQ handler.

Seems that way to me too.

Maybe since I have 2 real cores, and 3 processes busy-spinning on their CPU cores,
the remaining process can just never handle all the IRQs and get back to the
cpu shutdown state machine? The various soft-hang stacks below show at least slightly
different stacks, so I assume that thread is doing at least something.

What's the source of all those IRQs tho? I don't think the IRQs are
from actual events. The system is quiesced. Even if it's from
receiving packets, it's gonna quiet down pretty quickly. The hang
doesn't go away if you disconnect the network cable while hung, right?

What could be happening is that IRQ handling is handled by a thread
but the IRQ handler itself doesn't clear the IRQ properly and depends
on the handling thread to clear the condition. If no CPU is available
for scheduling, it might end up raising and re-reraising IRQs for the
same condition without ever being handled. If that's the case, such
lockup could happen on a normally functioning UP machine or if the IRQ
is pinned to a single CPU which happens to be running the handling
thread. At any rate, it'd be a plain live-lock bug on the driver
side.

Can you please try to confirm the specific interrupt being
continuously raised? Detecting the hang shouldn't be too difficult.
Just recording the starting jiffies and if progress hasn't been made
for, say, ten seconds, it can set a flag and then print the IRQs being
handled if the flag is set. If it indeed is the ath device, we
probably wanna get the driver maintainer involved.

I am not sure how to tell which IRQ is being handled. Do the
stack traces (showing smp_apic_timer_interrupt, for instance)
indicate potential culprits, or is that more a symptom of just
when the soft-lockup check is called?


Where should I add code to print out irqs? In the lockup state,
the thread (probably) stuck handling irqs isn't executing any code in
the stop_machine file as far as I can tell.

Maybe I need to instrument the __do_softirq or similar method?

For what it's worth, previous debugging appears to show that jiffies
stops incrementing in many of these lockups.

Also, I have been trying for 20+ minutes to reproduce the lockup
with the ath9k module removed (and my user-space app that uses it
stopped), and I have not reproduced it yet. So, possibly it is
related to ath9k, but my user-space app pokes at lots of other
stuff and starts loads of dhcp client processes and such too,
so not sure yet.


I re-added ath9k, turned on my app (to create 400 stations, etc),
re-started the module unload/load loop:
for i in `seq 10000`; do modprobe macvlan; rmmod macvlan; done
and hit the problem fairly quickly.

This is on stock 3.9.4, with Rusty's kobj patch, and some printk debugging
I added to the stop_machine.c file.

Perhaps interestingly, I do see an ath9k warning/error in this log as well.

Also, since lockdep is enabled, we get some irq printouts. Does this mean
anything to you?

__stop_machine(upstream): num-threads: 4, fn: __try_stop_module(ffffffff810f330a) data: ffff8801c594bf28 smdata: ffff8801c594be58
set_state, cpu: 0 state: 0 newstate: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 loops: 1 jiffies: 4297177162 timeout: 4297177161 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 loops: 1 jiffies: 4297177162 timeout: 4297177161 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 0 loops: 1 jiffies: 4297177162 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state jiffies: 4297177162 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 loops: 1 jiffies: 4297177162 timeout: 4297177161 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state end jiffies: 4297177162 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 loops: 1 jiffies: 4297177162 timeout: 4297177161 curstate: 0 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 3 loops: 1 jiffies: 4297177162 curstate: 0 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state jiffies: 4297177162 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 2 loops: 1 jiffies: 4297177162 curstate: 0 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state end jiffies: 4297177162 smdata->state: 1 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state jiffies: 4297177162 smdata->state: 1 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state end jiffies: 4297177162 smdata->state: 1 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 1 loops: 1 jiffies: 4297177313 curstate: 0 smdata->state: 1 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state jiffies: 4297177326 smdata->state: 1 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 increment state, smdata: ffff8801c594be58 fn: ffffffff810f330a
set_state, cpu: 1 state: 1 newstate: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state end jiffies: 4297177350 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 0 loops: 12226751 jiffies: 4297177350 curstate: 1 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 2 loops: 12286477 jiffies: 4297177350 curstate: 1 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state jiffies: 4297177350 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state end jiffies: 4297177350 smdata->state: 2 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 3 loops: 25634226 jiffies: 4297177350 curstate: 1 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state jiffies: 4297177350 smdata->state: 2 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state end jiffies: 4297177350 smdata->state: 2 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state jiffies: 4297177350 smdata->state: 2 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state end jiffies: 4297177350 smdata->state: 2 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 1 loops: 2 jiffies: 4297177350 curstate: 1 smdata->state: 2 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state jiffies: 4297177350 smdata->state: 2 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 increment state, smdata: ffff8801c594be58 fn: ffffffff810f330a
set_state, cpu: 1 state: 2 newstate: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state end jiffies: 4297177350 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 3 loops: 47817385 jiffies: 4297177350 curstate: 2 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 2 loops: 31933478 jiffies: 4297177350 curstate: 2 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 0 loops: 31875466 jiffies: 4297177350 curstate: 2 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state jiffies: 4297177350 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state jiffies: 4297177350 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
calling fn: cpu: 0 loops: 31875466 curstate: 3 smdata->state: 3 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state end jiffies: 4297177350 smdata->state: 3 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state jiffies: 4297177350 smdata->state: 3 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state end jiffies: 4297177350 smdata->state: 3 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state end jiffies: 4297177350 smdata->state: 3 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 1 loops: 3 jiffies: 4297177350 curstate: 2 smdata->state: 3 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state jiffies: 4297177350 smdata->state: 3 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 increment state, smdata: ffff8801c594be58 fn: ffffffff810f330a
set_state, cpu: 1 state: 3 newstate: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state end jiffies: 4297177350 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 3 loops: 71662403 jiffies: 4297177350 curstate: 3 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 2 loops: 53246516 jiffies: 4297177350 curstate: 3 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 0 loops: 53187256 jiffies: 4297177350 curstate: 3 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state jiffies: 4297177350 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state jiffies: 4297177350 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 2 ack_state end jiffies: 4297177350 smdata->state: 4 thread_ack: 3 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 0 ack_state end jiffies: 4297177350 smdata->state: 4 thread_ack: 2 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state jiffies: 4297177350 smdata->state: 4 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 3 ack_state end jiffies: 4297177656 smdata->state: 4 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
state-change: cpu: 1 loops: 4 jiffies: 4297177768 curstate: 3 smdata->state: 4 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state jiffies: 4297177780 smdata->state: 4 thread_ack: 1 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 increment state, smdata: ffff8801c594be58 fn: ffffffff810f330a
set_state, cpu: 1 state: 4 newstate: 5 smdata: ffff8801c594be58 fn: ffffffff810f330a
cpu: 1 ack_state end jiffies: 4297177804 smdata->state: 5 thread_ack: 4 smdata: ffff8801c594be58 fn: ffffffff810f330a
ath: wiphy0: Failed to stop TX DMA, queues=0x005!
__stop_machine(upstream): num-threads: 4, fn: __unlink_module(ffffffff810f2dab) data: ffffffffa117aad0 smdata: ffff8801c594be18
set_state, cpu: 0 state: 0 newstate: 1 smdata: ffff8801c594be18 fn: ffffffff810f2dab
sta201: authenticate with 00:de:ad:1d:ea:01
cpu: 1 loops: 1 jiffies: 4297178887 timeout: 4297178886 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 2 loops: 1 jiffies: 4297178887 timeout: 4297178886 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 0 loops: 1 jiffies: 4297178887 timeout: 4297178886 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 2 loops: 1 jiffies: 4297178887 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 3 loops: 1 jiffies: 4297178887 timeout: 4297178886 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 2 ack_state jiffies: 4297178887 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 0 loops: 1 jiffies: 4297178887 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 2 ack_state end jiffies: 4297178887 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 3 loops: 1 jiffies: 4297178887 curstate: 0 smdata->state: 1 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 0 ack_state jiffies: 4297178887 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 3 ack_state jiffies: 4297178887 smdata->state: 1 thread_ack: 3 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 0 ack_state end jiffies: 4297178887 smdata->state: 1 thread_ack: 2 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 3 ack_state end jiffies: 4297178887 smdata->state: 1 thread_ack: 1 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 1 loops: 1 jiffies: 4297179040 curstate: 0 smdata->state: 1 thread_ack: 1 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 1 ack_state jiffies: 4297179054 smdata->state: 1 thread_ack: 1 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 1 increment state, smdata: ffff8801c594be18 fn: ffffffff810f2dab
set_state, cpu: 1 state: 1 newstate: 2 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 1 ack_state end jiffies: 4297179083 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 2 loops: 25037141 jiffies: 4297179083 curstate: 1 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 3 loops: 28970837 jiffies: 4297179083 curstate: 1 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 2 ack_state jiffies: 4297179083 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 3 ack_state jiffies: 4297179083 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
state-change: cpu: 0 loops: 25898133 jiffies: 4297179083 curstate: 1 smdata->state: 2 thread_ack: 4 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 2 ack_state end jiffies: 4297179083 smdata->state: 2 thread_ack: 3 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 0 ack_state jiffies: 4297179083 smdata->state: 2 thread_ack: 2 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 3 ack_state end jiffies: 4297179083 smdata->state: 2 thread_ack: 2 smdata: ffff8801c594be18 fn: ffffffff810f2dab
cpu: 0 ack_state end jiffies: 4297179083 smdata->state: 2 thread_ack: 1 smdata: ffff8801c594be18 fn: ffffffff810f2dab
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux-2.6.linus/kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
Hardware name: To be filled by O.E.M.
Watchdog detected hard LOCKUP on cpu 2
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11
Call Trace:
<NMI> [<ffffffff810977f1>] warn_slowpath_common+0x85/0x9f
[<ffffffff810978ae>] warn_slowpath_fmt+0x46/0x48
[<ffffffff8110f42d>] watchdog_overflow_callback+0x9c/0xa7
[<ffffffff8113feb6>] __perf_event_overflow+0x137/0x1cb
[<ffffffff8101dff6>] ? x86_perf_event_set_period+0x103/0x10f
[<ffffffff811403fa>] perf_event_overflow+0x14/0x16
[<ffffffff81023730>] intel_pmu_handle_irq+0x2dc/0x359
[<ffffffff815eee05>] perf_event_nmi_handler+0x19/0x1b
[<ffffffff815ee5f3>] nmi_handle+0x7f/0xc2
[<ffffffff815ee574>] ? oops_begin+0xa9/0xa9
[<ffffffff815ee6f2>] do_nmi+0xbc/0x304
[<ffffffff815edd81>] end_repeat_nmi+0x1e/0x2e
[<ffffffff81099fce>] ? vprintk_emit+0x40a/0x444
[<ffffffff81104ef8>] ? stop_machine_cpu_stop+0xd8/0x274
[<ffffffff81104ef8>] ? stop_machine_cpu_stop+0xd8/0x274
[<ffffffff81104ef8>] ? stop_machine_cpu_stop+0xd8/0x274
<<EOE>> [<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff81104e20>] ? stop_one_cpu_nowait+0x30/0x30
[<ffffffff81104b8d>] cpu_stopper_thread+0xae/0x162
[<ffffffff815ebb1f>] ? __schedule+0x5ef/0x637
[<ffffffff815ecf38>] ? _raw_spin_unlock_irqrestore+0x47/0x7e
[<ffffffff810e92cc>] ? trace_hardirqs_on_caller+0x123/0x15a
[<ffffffff810e9310>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff815ecf61>] ? _raw_spin_unlock_irqrestore+0x70/0x7e
[<ffffffff810bef34>] smpboot_thread_fn+0x258/0x260
[<ffffffff810becdc>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff810b7c22>] kthread+0xc7/0xcf
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff815f3b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
---[ end trace 4947dfa9b0a4cec3 ]---
BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
irq event stamp: 835637905
hardirqs last enabled at (835637904): [<ffffffff8109f4c1>] __do_softirq+0x9f/0x257
hardirqs last disabled at (835637905): [<ffffffff815f48ad>] apic_timer_interrupt+0x6d/0x80
softirqs last enabled at (5654720): [<ffffffff8109f621>] __do_softirq+0x1ff/0x257
softirqs last disabled at (5654725): [<ffffffff8109f743>] irq_exit+0x5f/0xbb
CPU 1
Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff8109ee72>] [<ffffffff8109ee72>] tasklet_hi_action+0xf0/0xf0
RSP: 0018:ffff88022bc83ef0 EFLAGS: 00000212
RAX: 0000000000000006 RBX: ffff880217deb710 RCX: 0000000000000006
RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffffffff81a050b0
RBP: ffff88022bc83f78 R08: ffffffff81a050b0 R09: ffff88022bc83cc8
R10: 00000000000005f2 R11: ffff8802203aaf50 R12: ffff88022bc83e68
R13: ffffffff815f48b2 R14: ffff88022bc83f78 R15: ffff88022230e000
FS: 0000000000000000(0000) GS:ffff88022bc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000430070 CR3: 00000001cbc5d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/1 (pid: 17, threadinfo ffff88022230e000, task ffff8802223142c0)
Stack:
ffffffff8109f539 ffff88022bc83f08 ffff88022230e010 042080402bc83f88
000000010021bfcd 000000012bc83fa8 ffff88022230e000 ffff88022230ffd8
0000000000000030 ffff880200000006 00000248d8cdab1c 1304da35fe841722
Call Trace:
<IRQ>
[<ffffffff8109f539>] ? __do_softirq+0x117/0x257
[<ffffffff8109f743>] irq_exit+0x5f/0xbb
[<ffffffff815f59fd>] smp_apic_timer_interrupt+0x8a/0x98
[<ffffffff815f48b2>] apic_timer_interrupt+0x72/0x80
<EOI>
[<ffffffff81099fdb>] ? vprintk_emit+0x417/0x444
[<ffffffff815e9fc0>] printk+0x4d/0x4f
[<ffffffff81104b36>] ? cpu_stopper_thread+0x57/0x162
[<ffffffff8110504c>] stop_machine_cpu_stop+0x22c/0x274
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff81104e20>] ? stop_one_cpu_nowait+0x30/0x30
[<ffffffff81104b8d>] cpu_stopper_thread+0xae/0x162
[<ffffffff815ebb1f>] ? __schedule+0x5ef/0x637
[<ffffffff815ecf38>] ? _raw_spin_unlock_irqrestore+0x47/0x7e
[<ffffffff810e92cc>] ? trace_hardirqs_on_caller+0x123/0x15a
[<ffffffff810e9310>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff815ecf61>] ? _raw_spin_unlock_irqrestore+0x70/0x7e
[<ffffffff810bef34>] smpboot_thread_fn+0x258/0x260
[<ffffffff810becdc>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff810b7c22>] kthread+0xc7/0xcf
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff815f3b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
Code: 1c 25 18 e2 00 00 e8 cd fe ff ff e8 ac a4 04 00 fb 66 66 90 66 66 90 4c 89 e3 48 85 db 0f 85 79 ff ff ff 5f 5b 41 5c 41 5d c9 c3 <55> 48 89 e5 41 55 41 54 53 4
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux-2.6.linus/kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
Hardware name: To be filled by O.E.M.
Watchdog detected hard LOCKUP on cpu 0
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
Pid: 8, comm: migration/0 Tainted: G WC 3.9.4+ #11
Call Trace:
<NMI> [<ffffffff810977f1>] warn_slowpath_common+0x85/0x9f
[<ffffffff810978ae>] warn_slowpath_fmt+0x46/0x48
[<ffffffff8110f42d>] watchdog_overflow_callback+0x9c/0xa7
[<ffffffff8113feb6>] __perf_event_overflow+0x137/0x1cb
[<ffffffff8101dff6>] ? x86_perf_event_set_period+0x103/0x10f
[<ffffffff811403fa>] perf_event_overflow+0x14/0x16
[<ffffffff81023730>] intel_pmu_handle_irq+0x2dc/0x359
[<ffffffff815eee05>] perf_event_nmi_handler+0x19/0x1b
[<ffffffff815ee5f3>] nmi_handle+0x7f/0xc2
[<ffffffff815ee574>] ? oops_begin+0xa9/0xa9
[<ffffffff815ee6f2>] do_nmi+0xbc/0x304
[<ffffffff815edd81>] end_repeat_nmi+0x1e/0x2e
[<ffffffff81099fce>] ? vprintk_emit+0x40a/0x444
[<ffffffff81104efa>] ? stop_machine_cpu_stop+0xda/0x274
[<ffffffff81104efa>] ? stop_machine_cpu_stop+0xda/0x274
[<ffffffff81104efa>] ? stop_machine_cpu_stop+0xda/0x274
<<EOE>> [<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff81104e20>] ? stop_one_cpu_nowait+0x30/0x30
[<ffffffff81104b8d>] cpu_stopper_thread+0xae/0x162
[<ffffffff815ebb1f>] ? __schedule+0x5ef/0x637
[<ffffffff815ecf38>] ? _raw_spin_unlock_irqrestore+0x47/0x7e
[<ffffffff810e92cc>] ? trace_hardirqs_on_caller+0x123/0x15a
[<ffffffff810e9310>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff815ecf61>] ? _raw_spin_unlock_irqrestore+0x70/0x7e
[<ffffffff810bef34>] smpboot_thread_fn+0x258/0x260
[<ffffffff810becdc>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff810b7c22>] kthread+0xc7/0xcf
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff815f3b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
---[ end trace 4947dfa9b0a4cec4 ]---
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux-2.6.linus/kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
Hardware name: To be filled by O.E.M.
Watchdog detected hard LOCKUP on cpu 3
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
Pid: 29, comm: migration/3 Tainted: G WC 3.9.4+ #11
Call Trace:
<NMI> [<ffffffff810977f1>] warn_slowpath_common+0x85/0x9f
[<ffffffff810978ae>] warn_slowpath_fmt+0x46/0x48
[<ffffffff8110f42d>] watchdog_overflow_callback+0x9c/0xa7
[<ffffffff8113feb6>] __perf_event_overflow+0x137/0x1cb
[<ffffffff8101dff6>] ? x86_perf_event_set_period+0x103/0x10f
[<ffffffff811403fa>] perf_event_overflow+0x14/0x16
[<ffffffff81023730>] intel_pmu_handle_irq+0x2dc/0x359
[<ffffffff815eee05>] perf_event_nmi_handler+0x19/0x1b
[<ffffffff815ee5f3>] nmi_handle+0x7f/0xc2
[<ffffffff815ee574>] ? oops_begin+0xa9/0xa9
[<ffffffff815ee6f2>] do_nmi+0xbc/0x304
[<ffffffff815edd81>] end_repeat_nmi+0x1e/0x2e
[<ffffffff81099fce>] ? vprintk_emit+0x40a/0x444
[<ffffffff81104efa>] ? stop_machine_cpu_stop+0xda/0x274
[<ffffffff81104efa>] ? stop_machine_cpu_stop+0xda/0x274
[<ffffffff81104efa>] ? stop_machine_cpu_stop+0xda/0x274
<<EOE>> [<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff81104e20>] ? stop_one_cpu_nowait+0x30/0x30
[<ffffffff81104b8d>] cpu_stopper_thread+0xae/0x162
[<ffffffff815ebb1f>] ? __schedule+0x5ef/0x637
[<ffffffff815ecf38>] ? _raw_spin_unlock_irqrestore+0x47/0x7e
[<ffffffff810e92cc>] ? trace_hardirqs_on_caller+0x123/0x15a
[<ffffffff810e9310>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff815ecf61>] ? _raw_spin_unlock_irqrestore+0x70/0x7e
[<ffffffff810bef34>] smpboot_thread_fn+0x258/0x260
[<ffffffff810becdc>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff810b7c22>] kthread+0xc7/0xcf
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff815f3b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
---[ end trace 4947dfa9b0a4cec5 ]---
BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
irq event stamp: 1774512131
hardirqs last enabled at (1774512130): [<ffffffff8109f4c1>] __do_softirq+0x9f/0x257
hardirqs last disabled at (1774512131): [<ffffffff815f48ad>] apic_timer_interrupt+0x6d/0x80
softirqs last enabled at (5654720): [<ffffffff8109f621>] __do_softirq+0x1ff/0x257
softirqs last disabled at (5654725): [<ffffffff8109f743>] irq_exit+0x5f/0xbb
CPU 1
Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff8109f4c5>] [<ffffffff8109f4c5>] __do_softirq+0xa3/0x257
RSP: 0018:ffff88022bc83ef8 EFLAGS: 00000202
RAX: ffff8802223142c0 RBX: ffff880217deb710 RCX: 0000000000000006
RDX: ffff88022230e010 RSI: 0000000000000000 RDI: ffff8802223142c0
RBP: ffff88022bc83f78 R08: ffffffff81a050b0 R09: ffff88022bc83cc8
R10: 00000000000005f2 R11: ffff8802203aaf50 R12: ffff88022bc83e68
R13: ffffffff815f48b2 R14: ffff88022bc83f78 R15: ffff88022230e000
FS: 0000000000000000(0000) GS:ffff88022bc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000430070 CR3: 00000001cbc5d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/1 (pid: 17, threadinfo ffff88022230e000, task ffff8802223142c0)
Stack:
ffff88022bc83f08 ffff88022230e010 042080402bc83f88 000000010021bfcd
000000012bc83fa8 ffff88022230e000 ffff88022230ffd8 0000000000000038
ffff880200000006 00000248d8cdab1c 1304da35fe841722 ffff88022bc8dc80
Call Trace:
<IRQ>
[<ffffffff8109f743>] irq_exit+0x5f/0xbb
[<ffffffff815f59fd>] smp_apic_timer_interrupt+0x8a/0x98
[<ffffffff815f48b2>] apic_timer_interrupt+0x72/0x80
<EOI>
[<ffffffff81099fdb>] ? vprintk_emit+0x417/0x444
[<ffffffff815e9fc0>] printk+0x4d/0x4f
[<ffffffff81104b36>] ? cpu_stopper_thread+0x57/0x162
[<ffffffff8110504c>] stop_machine_cpu_stop+0x22c/0x274
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff81104e20>] ? stop_one_cpu_nowait+0x30/0x30
[<ffffffff81104b8d>] cpu_stopper_thread+0xae/0x162
[<ffffffff815ebb1f>] ? __schedule+0x5ef/0x637
[<ffffffff815ecf38>] ? _raw_spin_unlock_irqrestore+0x47/0x7e
[<ffffffff810e92cc>] ? trace_hardirqs_on_caller+0x123/0x15a
[<ffffffff810e9310>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff815ecf61>] ? _raw_spin_unlock_irqrestore+0x70/0x7e
[<ffffffff810bef34>] smpboot_thread_fn+0x258/0x260
[<ffffffff810becdc>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff810b7c22>] kthread+0xc7/0xcf
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff815f3b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
Code: 55 b0 49 81 ec d8 1f 00 00 49 8d 44 24 10 4c 89 65 a8 48 89 45 88 65 c7 04 25 80 1b 01 00 00 00 00 00 e8 42 9e 04 00 fb 66 66 90 <66> 66 90 48 c7 c3 80 50 a0 8
BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
irq event stamp: 2713027507
hardirqs last enabled at (2713027506): [<ffffffff8109f4c1>] __do_softirq+0x9f/0x257
hardirqs last disabled at (2713027507): [<ffffffff815f48ad>] apic_timer_interrupt+0x6d/0x80
softirqs last enabled at (5654720): [<ffffffff8109f621>] __do_softirq+0x1ff/0x257
softirqs last disabled at (5654725): [<ffffffff8109f743>] irq_exit+0x5f/0xbb
CPU 1
Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff8109f4c5>] [<ffffffff8109f4c5>] __do_softirq+0xa3/0x257
RSP: 0018:ffff88022bc83ef8 EFLAGS: 00000286
RAX: ffff8802223142c0 RBX: ffff880217deb710 RCX: 0000000000000006
RDX: ffff88022230e010 RSI: 0000000000000000 RDI: ffff8802223142c0
RBP: ffff88022bc83f78 R08: ffffffff81a050b0 R09: ffff88022bc83cc8
R10: 00000000000005f2 R11: ffff88022bc83c38 R12: ffff88022bc83e68
R13: ffffffff815f48b2 R14: ffff88022bc83f78 R15: ffff88022230e000
FS: 0000000000000000(0000) GS:ffff88022bc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000430070 CR3: 00000001cbc5d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/1 (pid: 17, threadinfo ffff88022230e000, task ffff8802223142c0)
Stack:
ffff88022bc83f08 ffff88022230e010 042080402bc83f88 000000010021bfcd
000000012bc83fa8 ffff88022230e000 ffff88022230ffd8 0000000000000038
ffff880200000006 00000248d8cdab1c 1304da35fe841722 ffff88022bc8dc80
Call Trace:
<IRQ>
[<ffffffff8109f743>] irq_exit+0x5f/0xbb
[<ffffffff815f59fd>] smp_apic_timer_interrupt+0x8a/0x98
[<ffffffff815f48b2>] apic_timer_interrupt+0x72/0x80
<EOI>
[<ffffffff81099fdb>] ? vprintk_emit+0x417/0x444
[<ffffffff815e9fc0>] printk+0x4d/0x4f
[<ffffffff81104b36>] ? cpu_stopper_thread+0x57/0x162
[<ffffffff8110504c>] stop_machine_cpu_stop+0x22c/0x274
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff810f2dab>] ? copy_module_from_fd+0xe7/0xe7
[<ffffffff81104e20>] ? stop_one_cpu_nowait+0x30/0x30
[<ffffffff81104b8d>] cpu_stopper_thread+0xae/0x162
[<ffffffff815ebb1f>] ? __schedule+0x5ef/0x637
[<ffffffff815ecf38>] ? _raw_spin_unlock_irqrestore+0x47/0x7e
[<ffffffff810e92cc>] ? trace_hardirqs_on_caller+0x123/0x15a
[<ffffffff810e9310>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff815ecf61>] ? _raw_spin_unlock_irqrestore+0x70/0x7e
[<ffffffff810bef34>] smpboot_thread_fn+0x258/0x260
[<ffffffff810becdc>] ? test_ti_thread_flag.clone.0+0x11/0x11
[<ffffffff810b7c22>] kthread+0xc7/0xcf
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff815f3b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff810b7b5b>] ? __init_kthread_worker+0x5b/0x5b
Code: 55 b0 49 81 ec d8 1f 00 00 49 8d 44 24 10 4c 89 65 a8 48 89 45 88 65 c7 04 25 80 1b 01 00 00 00 00 00 e8 42 9e 04 00 fb 66 66 90 <66> 66 90 48 c7 c3 80 50 a0 8

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/