Re: frequent lockups in 3.18rc4

From: Sasha Levin
Date: Mon Dec 08 2014 - 09:33:50 EST


On 12/08/2014 12:20 AM, Paul E. McKenney wrote:
> I have seen this caused by lost IPIs, but you have to lose two of them,
> which seems less than fully likely.

It does seem that it can cause full blown stalls as well, just pretty
rarely (notice the lack of any prints before):

[11373.032327] audit: type=1326 audit(1397703594.974:502): auid=4294967295 uid=7884781 gid=0 ses=4294967295 pid=9853 comm="trinity-c768" exe="/trinity/trinity" sig=9 arch=c0000
03e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0
[11374.565881] audit: type=1326 audit(1397703596.504:503): auid=4294967295 uid=32 gid=0
ses=4294967295 pid=9801 comm="trinity-c710" exe="/trinity/trinity" sig=9 arch=c000003e syscall=96 compat=0 ip=0x7fff2c3fee47 code=0x0
[11839.353539] Hangcheck: hangcheck value past margin!
[12040.010128] INFO: rcu_sched detected stalls on CPUs/tasks:
[12040.012072] (detected by 4, t=213513 jiffies, g=-222, c=-223, q=0)
[12040.014200] INFO: Stall ended before state dump start
[12159.730069] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12159.730069] (detected by 3, t=396537 jiffies, g=24095, c=24094, q=1346)
[12159.730069] INFO: Stall ended before state dump start
[12602.162439] Hangcheck: hangcheck value past margin!
[12655.560806] INFO: rcu_sched detected stalls on CPUs/tasks:
[12655.560806] 0: (3 ticks this GP) idle=bc3/140000000000002/0 softirq=26674/26674 last_accelerate: b2a8/da68, nonlazy_posted: 20893, ..
[12655.602171] (detected by 13, t=30506 jiffies, g=-219, c=-220, q=0)
[12655.602171] Task dump for CPU 0:
[12655.602171] trinity-c39 R running task 11904 6558 26120 0x0008000c
[12655.602171] ffffffff81593bf7 ffff8808d5d58d40 0000000000000282 ffffffff9ef40538
[12655.602171] ffff880481400000 ffff8808d5dcb638 ffffffff83f2ed2b ffffffff9eaa1718
[12655.602171] ffff8808d5d58d08 00000b820c44b0ae 0000000000000000 0000000000000001
[12655.602171] Call Trace:
[12655.602171] [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
[12655.602171] [<ffffffff83f2ed2b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[12655.602171] [<ffffffff91677ffb>] ? retint_restore_args+0x13/0x13
[12655.602171] [<ffffffff91676612>] ? _raw_spin_unlock_irqrestore+0xa2/0xf0
[12655.602171] [<ffffffff83f93705>] ? __debug_check_no_obj_freed+0x2f5/0xd90
[12655.602171] [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
[12655.602171] [<ffffffff83f95ba9>] ? debug_check_no_obj_freed+0x19/0x20
[12655.602171] [<ffffffff819049bf>] ? free_pages_prepare+0x5bf/0x1000
[12655.602171] [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
[12655.602171] [<ffffffff8190cacd>] ? __free_pages_ok+0x3d/0x360
[12655.602171] [<ffffffff8190ce7d>] ? free_compound_page+0x8d/0xd0
[12655.602171] [<ffffffff81929986>] ? __put_compound_page+0x46/0x70
[12655.602171] [<ffffffff8192b395>] ? put_compound_page+0xf5/0x10e0
[12655.602171] [<ffffffff814d99ab>] ? preempt_count_sub+0x11b/0x1d0
[12655.602171] [<ffffffff8192da4d>] ? release_pages+0x41d/0x6f0
[12655.602171] [<ffffffff81a0188b>] ? free_pages_and_swap_cache+0x11b/0x1a0
[12655.602171] [<ffffffff819a6b92>] ? tlb_flush_mmu_free+0x72/0x180
[12655.602171] [<ffffffff819ace76>] ? unmap_single_vma+0x1326/0x2170
[12655.602171] [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
[12655.602171] [<ffffffff819b0644>] ? unmap_vmas+0xd4/0x250
[12655.602171] [<ffffffff819d62c9>] ? exit_mmap+0x169/0x610
[12655.602171] [<ffffffff81a678fd>] ? kmem_cache_free+0x7cd/0xbb0
[12655.602171] [<ffffffff814095b2>] ? mmput+0xd2/0x2c0
[12655.602171] [<ffffffff81423551>] ? do_exit+0x7e1/0x39c0
[12655.602171] [<ffffffff81456fb2>] ? get_signal+0x7a2/0x2130
[12655.602171] [<ffffffff81426891>] ? do_group_exit+0x101/0x490
[12655.602171] [<ffffffff814d99ab>] ? preempt_count_sub+0x11b/0x1d0
[12655.602171] [<ffffffff81456f4e>] ? get_signal+0x73e/0x2130
[12655.602171] [<ffffffff811d59f1>] ? sched_clock+0x31/0x50
[12655.602171] [<ffffffff81585ded>] ? get_lock_stats+0x1d/0x100
[12655.602171] [<ffffffff811ac828>] ? do_signal+0x28/0x3750
[12655.602171] [<ffffffff814f7f73>] ? vtime_account_user+0x173/0x220
[12655.602171] [<ffffffff814d96c1>] ? get_parent_ip+0x11/0x50
[12655.602171] [<ffffffff83f90fe3>] ? __this_cpu_preempt_check+0x13/0x20
[12655.602171] [<ffffffff81593bf7>] ? trace_hardirqs_on_caller+0x677/0x900
[12655.602171] [<ffffffff81593e8d>] ? trace_hardirqs_on+0xd/0x10
[12655.602171] [<ffffffff811affb9>] ? do_notify_resume+0x69/0x100
[12655.602171] [<ffffffff9167744f>] ? int_signal+0x12/0x17


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/