rcu_sched self-detected stall on 4.2-rc6

From: Coelho, Luciano
Date: Thu Aug 20 2015 - 06:51:23 EST


Hi,

Yesterday I suddenly got an RCU stall on my machine. I don't know what
really led to it, I just started getting "BUG: soft lockup" messages on
all my terminals.

Here's a small extract of what the logs show:

[88989.550488] INFO: rcu_sched self-detected stall on CPU { 0} (t=5250 jiffies g=2697311 c=2697310 q=12358)
[88989.550496] Task dump for CPU 0:
[88989.550499] chrome R running task 0 2311 2241 0x00000108
[88989.550502] 0000000000000001 ffffffff81854ac0 ffffffff810c33a0 000000000029285f
[88989.550505] ffff88031ea16500 ffffffff81854ac0 0000000000000000 ffffffff81907580
[88989.550508] ffffffff810c6501 ffff88030c7b9140 ffffffff81088d05 ffffffff81ac14c0
[88989.550510] Call Trace:
[88989.550512] <IRQ> [<ffffffff810c33a0>] ? rcu_dump_cpu_stacks+0x80/0xb0
[88989.550522] [<ffffffff810c6501>] ? rcu_check_callbacks+0x421/0x6e0
[88989.550525] [<ffffffff81088d05>] ? notifier_call_chain+0x45/0x70
[88989.550528] [<ffffffff810d02e1>] ? timekeeping_update+0xf1/0x150
[88989.550531] [<ffffffff810d9290>] ? tick_sched_handle.isra.15+0x60/0x60
[88989.550534] [<ffffffff810caf66>] ? update_process_times+0x36/0x60
[88989.550537] [<ffffffff810d9290>] ? tick_sched_handle.isra.15+0x60/0x60
[88989.550539] [<ffffffff810d9254>] ? tick_sched_handle.isra.15+0x24/0x60
[88989.550542] [<ffffffff810d9290>] ? tick_sched_handle.isra.15+0x60/0x60
[88989.550545] [<ffffffff810d92cb>] ? tick_sched_timer+0x3b/0x70
[88989.550547] [<ffffffff810cba66>] ? __hrtimer_run_queues+0xd6/0x200
[88989.550551] [<ffffffff8101c2e5>] ? read_tsc+0x5/0x10
[88989.550554] [<ffffffff810cbe7a>] ? hrtimer_interrupt+0x9a/0x180
[88989.550558] [<ffffffff815692b9>] ? smp_apic_timer_interrupt+0x39/0x50
[88989.550560] [<ffffffff8156749b>] ? apic_timer_interrupt+0x6b/0x70
[88989.550563] [<ffffffff810c9770>] ? del_timer+0x60/0x60
[88989.550565] [<ffffffff810c9814>] ? del_timer_sync+0x44/0x50
[88989.550569] [<ffffffff814b6ea0>] ? inet_csk_reqsk_queue_drop+0x60/0x1b0
[88989.550572] [<ffffffff814b70df>] ? reqsk_timer_handler+0xef/0x280
[88989.550574] [<ffffffff814b6ff0>] ? inet_csk_reqsk_queue_drop+0x1b0/0x1b0
[88989.550576] [<ffffffff810c9020>] ? call_timer_fn+0x30/0xe0
[88989.550578] [<ffffffff814b6ff0>] ? inet_csk_reqsk_queue_drop+0x1b0/0x1b0
[88989.550581] [<ffffffff810c9513>] ? run_timer_softirq+0x163/0x280
[88989.550583] [<ffffffff8101c2e5>] ? read_tsc+0x5/0x10
[88989.550586] [<ffffffff8106fbfe>] ? __do_softirq+0xfe/0x250
[88989.550589] [<ffffffff8106fec2>] ? irq_exit+0x92/0xa0
[88989.550592] [<ffffffff815692be>] ? smp_apic_timer_interrupt+0x3e/0x50
[88989.550594] [<ffffffff8156749b>] ? apic_timer_interrupt+0x6b/0x70
[88989.550595] <EOI>
[89008.593604] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]

The full log can be found here (there are lots of drm/i915 warnings
too, but I think they're unrelated):

http://pastebin.coelho.fi/265ebd1dcd443446.txt

Has anyone else seen this? Or does anyone have a clue of what it might be?

--
Cheers,
Luca.