Re: 4.2-rc5 rcu stalls.

From: Dave Jones
Date: Thu Aug 06 2015 - 00:15:44 EST


On Wed, Aug 05, 2015 at 02:37:59PM +0200, Frederic Weisbecker wrote:
> On Tue, Aug 04, 2015 at 08:12:50PM -0400, Dave Jones wrote:
> > On Tue, Aug 04, 2015 at 12:54:35AM -0400, Sasha Levin wrote:
> > > On 08/03/2015 06:03 PM, Paul E. McKenney wrote:
> > > >> > Ugh, that doesn't revert cleanly. Got something handy ?
> > > > I do not, but perhaps either Sasha or Frederic do.
> > >
> > > I've attached a revert courtesy of Peter.
> >
> > Thanks. At first I thought this was doing the trick, but then I hit this again.
> >
> >
> > [23643.545873] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [23643.546031] Tasks blocked on level-0 rcu_node (CPUs 0-3): P31722
> > [23643.546173] (detected by 3, t=65002 jiffies, g=2256887, c=2256886, q=0)
> > [23643.546326] trinity-watchdo R running task 14336 31722 31721 0x00080000
> > [23643.546488] ffff8804fcfe7cc8 000000000000ded0 0000000000000002 ffff8804f58bb680
> > [23643.546661] ffff8800ce4951c0 ffff8804fcfe7cb8 ffff8804fcfe8000 ffff8804f6552608
> > [23643.546830] 0000000000000009 ffff8804fcfe7e88 0000000000000009 ffff8804fcfe7ce8
> > [23643.547001] Call Trace:
> > [23643.547058] [<ffffffff887fa2b2>] preempt_schedule_common+0x22/0x40
> > [23643.547201] [<ffffffff887fa2ef>] preempt_schedule+0x1f/0x30
> > [23643.547329] [<ffffffff88001058>] ___preempt_schedule+0x12/0x14
> > [23643.547465] [<ffffffff8808b76d>] ? do_send_sig_info+0x5d/0x80
> > [23643.547599] [<ffffffff887fff32>] ? _raw_spin_unlock_irqrestore+0x42/0x70
> > [23643.547753] [<ffffffff887fff50>] ? _raw_spin_unlock_irqrestore+0x60/0x70
> > [23643.547910] [<ffffffff8808b76d>] do_send_sig_info+0x5d/0x80
> > [23643.548039] [<ffffffff8808be62>] group_send_sig_info+0xb2/0x120
> > [23643.548175] [<ffffffff8808bdb5>] ? group_send_sig_info+0x5/0x120
> > [23643.548314] [<ffffffff880ea62f>] ? rcu_read_lock_held+0x4f/0x60
> > [23643.548451] [<ffffffff8808c05f>] kill_pid_info+0x7f/0x150
> > [23643.548576] [<ffffffff8808c000>] ? kill_pid_info+0x20/0x150
> > [23643.548705] [<ffffffff8808c244>] SYSC_kill+0xf4/0x2b0
> > [23643.548821] [<ffffffff8808c1ed>] ? SYSC_kill+0x9d/0x2b0
> > [23643.548942] [<ffffffff880d35cb>] ? trace_hardirqs_on_caller+0x14b/0x1e0
> > [23643.549097] [<ffffffff880d366d>] ? trace_hardirqs_on+0xd/0x10
> > [23643.549231] [<ffffffff88192f63>] ? context_tracking_user_exit+0x13/0x20
> > [23643.549387] [<ffffffff88012c47>] ? syscall_trace_enter_phase1+0xf7/0x150
> > [23643.549540] [<ffffffff88001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> > [23643.549687] [<ffffffff8808e64e>] SyS_kill+0xe/0x10
> > [23643.549799] [<ffffffff88800997>] entry_SYSCALL_64_fastpath+0x12/0x6f
>
> If it still happens after Sasha's revert, which basically revert all the offending
> patches related to preempt lately, then the reason might be elsewhere.
>
> How hard was it to reproduce? I see 23000 secs in your dmesg logs which is around 6 hours.

Interestingly, it happened again, but only after 7 hours.
I've yet to trigger it in a shorter timeframe. Frustrating.

[28190.798758] INFO: rcu_preempt detected stalls on CPUs/tasks:
[28190.798914] Tasks blocked on level-0 rcu_node (CPUs 0-3): P32189
[28190.799054] (detected by 1, t=65002 jiffies, g=2137396, c=2137395, q=0)
[28190.799203] trinity-c224 R running task 13856 32189 31964 0x00080000
[28190.799362] ffff8804f2323da8 ffffffffa67fa4d1 ffff8804fe170000 ffff8804b66db680
[28190.799531] ffff8804fe170000 ffff8804f2323d98 0000000000000000 ffff8804f2324000
[28190.799699] 0000000000000002 0000000000000000 0000000000000000 ffff8804f2323dc8
[28190.799866] Call Trace:
[28190.799921] [<ffffffffa67fa4d1>] ? preempt_schedule_irq+0x41/0xa0
[28190.800058] [<ffffffffa67fa4d7>] preempt_schedule_irq+0x47/0xa0
[28190.800191] [<ffffffffa6801529>] retint_kernel+0x1b/0x2d
[28190.800312] [<ffffffffa60d6319>] ? lock_acquire+0xd9/0x260
[28190.800438] [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.800568] [<ffffffffa680153b>] ? retint_kernel+0x2d/0x2d
[28190.800691] [<ffffffffa609d2d2>] __task_pid_nr_ns+0x42/0x190
[28190.800820] [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.800950] [<ffffffffa6091f0b>] sys_gettid+0x1b/0x20
[28190.801064] [<ffffffffa6800997>] entry_SYSCALL_64_fastpath+0x12/0x6f
[28190.801208] trinity-c224 R running task 13856 32189 31964 0x00080000
[28190.801365] ffff8804f2323da8 ffffffffa67fa4d1 ffff8804fe170000 ffff8804b66db680
[28190.801533] ffff8804fe170000 ffff8804f2323d98 0000000000000000 ffff8804f2324000
[28190.801702] 0000000000000002 0000000000000000 0000000000000000 ffff8804f2323dc8
[28190.801870] Call Trace:
[28190.801923] [<ffffffffa67fa4d1>] ? preempt_schedule_irq+0x41/0xa0
[28190.802060] [<ffffffffa67fa4d7>] preempt_schedule_irq+0x47/0xa0
[28190.802193] [<ffffffffa6801529>] retint_kernel+0x1b/0x2d
[28190.802313] [<ffffffffa60d6319>] ? lock_acquire+0xd9/0x260
[28190.802436] [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.802565] [<ffffffffa680153b>] ? retint_kernel+0x2d/0x2d
[28190.802688] [<ffffffffa609d2d2>] __task_pid_nr_ns+0x42/0x190
[28190.802815] [<ffffffffa609d295>] ? __task_pid_nr_ns+0x5/0x190
[28190.802945] [<ffffffffa6091f0b>] sys_gettid+0x1b/0x20
[28190.803058] [<ffffffffa6800997>] entry_SYSCALL_64_fastpath+0x12/0x6f
[29929.492752] INFO: rcu_preempt detected stalls on CPUs/tasks:
[29929.492906] Tasks blocked on level-0 rcu_node (CPUs 0-3): P289
[29929.493039] (detected by 0, t=65002 jiffies, g=2141006, c=2141005, q=0)
[29929.493188] systemd-journal R running task 12464 289 1 0x00080000
[29929.493347] ffff8804ff2bbae8 ffffffffa67fa4d1 ffff880501f81b40 ffff880503d43680
[29929.493515] ffff880501f81b40 ffff8804ff2bbad8 0000000000000000 ffff8804ff2bc000
[29929.493683] ffff8800d3e9f118 ffff8800d3e9eb40 0000000000000056 ffff8804ff2bbb08
[29929.493853] Call Trace:
[29929.493909] [<ffffffffa67fa4d1>] ? preempt_schedule_irq+0x41/0xa0
[29929.494046] [<ffffffffa67fa4d7>] preempt_schedule_irq+0x47/0xa0
[29929.494181] [<ffffffffa6801529>] retint_kernel+0x1b/0x2d
[29929.494304] [<ffffffffa67f9929>] ? __schedule+0x439/0xb20
[29929.494430] [<ffffffffa6001058>] ? ___preempt_schedule+0x12/0x14
[29929.494568] [<ffffffffa6001058>] ? ___preempt_schedule+0x12/0x14
[29929.494709] [<ffffffffa66b8b11>] ? sock_def_readable+0x161/0x190
[29929.501118] [<ffffffffa60ed468>] ? rcu_is_watching+0x38/0x60
[29929.507566] [<ffffffffa60ed481>] ? rcu_is_watching+0x51/0x60
[29929.513987] [<ffffffffa66b8b11>] sock_def_readable+0x161/0x190
[29929.520344] [<ffffffffa66b89b5>] ? sock_def_readable+0x5/0x190
[29929.526678] [<ffffffffa67ffe85>] ? _raw_spin_unlock+0x35/0x60
[29929.532988] [<ffffffffa67986c9>] unix_dgram_sendmsg+0x4f9/0x570
[29929.539184] [<ffffffffa66b509b>] ___sys_sendmsg+0x30b/0x320
[29929.545270] [<ffffffffa60cfe7e>] ? put_lock_stats.isra.29+0xe/0x30
[29929.551331] [<ffffffffa638a137>] ? debug_smp_processor_id+0x17/0x20
[29929.557285] [<ffffffffa60cfe7e>] ? put_lock_stats.isra.29+0xe/0x30
[29929.563204] [<ffffffffa60ad681>] ? get_parent_ip+0x11/0x50
[29929.569047] [<ffffffffa60ad813>] ? preempt_count_sub+0xa3/0xf0
[29929.574796] [<ffffffffa621a626>] ? __fget_light+0x66/0x90
[29929.580555] [<ffffffffa6192d53>] ? context_tracking_exit+0x43/0x240
[29929.586253] [<ffffffffa66b5712>] __sys_sendmsg+0x42/0x80
[29929.591843] [<ffffffffa66b5762>] SyS_sendmsg+0x12/0x20
[29929.597385] [<ffffffffa6800997>] entry_SYSCALL_64_fastpath+0x12/0x6f


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/