Re: 4.2-rc5 rcu stalls.

From: Paul E. McKenney
Date: Mon Aug 03 2015 - 17:37:32 EST


On Mon, Aug 03, 2015 at 05:08:35PM -0400, Dave Jones wrote:
> Content analysis details: (-2.9 points, 5.0 required)
>
> pts rule name description
> ---- ---------------------- --------------------------------------------------
> -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
> -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
> [score: 0.0000]
> X-Authenticated-User: davej@xxxxxxxxxxxxxxxxx
> X-ZLA-Header: unknown; 0
> X-ZLA-DetailInfo: BA=6.00003574; NDR=6.00000001; ZLA=6.00000002; ZF=6.00000004; ZB=6.00041153; ZH=6.00102541; ZP=6.00079473; ZU=6.00000001; UDB=6.00244655; UTC=2015-08-03 21:08:45
> x-cbid: 15080321-8236-0000-0000-00000DB1C0C1
> X-IBM-ISS-SpamDetectors: Score=0.405233; BY=0; FL=0; FP=0; FZ=0; HX=0; KW=0;
> PH=0; RB=0; SC=0.405233; ST=0; TS=0; UL=0; ISC=
> X-IBM-ISS-DetailInfo: BY=3.00004248; HX=3.00000236; KW=3.00000007;
> PH=3.00000003; SC=3.00000115; SDB=6.00568899; UDB=6.00244655; UTC=2015-08-03
> 21:08:47
> X-TM-AS-MML: disable
>
> [ 2120.854974] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2120.855128] Tasks blocked on level-0 rcu_node (CPUs 0-3): P1497
> [ 2120.855263] (detected by 2, t=65002 jiffies, g=78835, c=78834, q=0)
> [ 2120.855403] trinity-watchdo R running task 14336 1497 1496 0x00080000
> [ 2120.855563] ffff8804b94e3c88 ffffffffa17fa0b0 ffff8805010a1b40 ffff8804f58b51c0
> [ 2120.855728] ffff8805010a1b40 ffff8804b94e3c78 0000000000000000 ffff8804b94e4000
> [ 2120.855893] 0000000000000001 0000000000000001 0000000000000002 ffff8804b94e3ca8
> [ 2120.856062] Call Trace:
> [ 2120.856116] [<ffffffffa17fa0b0>] ? preempt_schedule_irq+0x40/0xa0
> [ 2120.856252] [<ffffffffa17fa0b6>] preempt_schedule_irq+0x46/0xa0

Sasha Levin (CCed) was having roughly similar stalls, and found that
reverting b30f0e3ffedf (sched/preempt: Optimize preemption operations
on __schedule() callers) made the stalls go away. Does that help in
your case?

Thanx, Paul

> [ 2120.856387] [<ffffffffa1801129>] retint_kernel+0x1b/0x2d
> [ 2120.856507] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2120.856627] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2120.856752] [<ffffffffa10d675e>] ? lock_release+0x33e/0x700
> [ 2120.856878] [<ffffffffa108be29>] ? group_send_sig_info+0x79/0x120
> [ 2120.857015] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2120.857137] [<ffffffffa108c07b>] ? kill_pid_info+0x9b/0x150
> [ 2120.857259] [<ffffffffa108c098>] kill_pid_info+0xb8/0x150
> [ 2120.857380] [<ffffffffa108c000>] ? kill_pid_info+0x20/0x150
> [ 2120.857508] [<ffffffffa108c244>] SYSC_kill+0xf4/0x2b0
> [ 2120.857620] [<ffffffffa108c1ed>] ? SYSC_kill+0x9d/0x2b0
> [ 2120.857736] [<ffffffffa10d354b>] ? trace_hardirqs_on_caller+0x14b/0x1e0
> [ 2120.857882] [<ffffffffa10d35ed>] ? trace_hardirqs_on+0xd/0x10
> [ 2120.858012] [<ffffffffa1192ed3>] ? context_tracking_user_exit+0x13/0x20
> [ 2120.858161] [<ffffffffa1012db7>] ? syscall_trace_enter_phase1+0xf7/0x150
> [ 2120.858314] [<ffffffffa1001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [ 2120.858456] [<ffffffffa108e64e>] SyS_kill+0xe/0x10
> [ 2120.858563] [<ffffffffa1800597>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [ 2120.858706] trinity-watchdo R running task 14336 1497 1496 0x00080000
> [ 2120.858863] ffff8804b94e3c88 ffffffffa17fa0b0 ffff8805010a1b40 ffff8804f58b51c0
> [ 2120.859028] ffff8805010a1b40 ffff8804b94e3c78 0000000000000000 ffff8804b94e4000
> [ 2120.859194] 0000000000000001 0000000000000001 0000000000000002 ffff8804b94e3ca8
> [ 2120.859358] Call Trace:
> [ 2120.859410] [<ffffffffa17fa0b0>] ? preempt_schedule_irq+0x40/0xa0
> [ 2120.859545] [<ffffffffa17fa0b6>] preempt_schedule_irq+0x46/0xa0
> [ 2120.859676] [<ffffffffa1801129>] retint_kernel+0x1b/0x2d
> [ 2120.859794] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2120.859916] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2120.860037] [<ffffffffa10d675e>] ? lock_release+0x33e/0x700
> [ 2120.860163] [<ffffffffa108be29>] ? group_send_sig_info+0x79/0x120
> [ 2120.860297] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2120.860422] [<ffffffffa108c07b>] ? kill_pid_info+0x9b/0x150
> [ 2120.860548] [<ffffffffa108c098>] kill_pid_info+0xb8/0x150
> [ 2120.860670] [<ffffffffa108c000>] ? kill_pid_info+0x20/0x150
> [ 2120.860793] [<ffffffffa108c244>] SYSC_kill+0xf4/0x2b0
> [ 2120.860906] [<ffffffffa108c1ed>] ? SYSC_kill+0x9d/0x2b0
> [ 2120.861022] [<ffffffffa10d354b>] ? trace_hardirqs_on_caller+0x14b/0x1e0
> [ 2120.861172] [<ffffffffa10d35ed>] ? trace_hardirqs_on+0xd/0x10
> [ 2120.861301] [<ffffffffa1192ed3>] ? context_tracking_user_exit+0x13/0x20
> [ 2120.861449] [<ffffffffa1012db7>] ? syscall_trace_enter_phase1+0xf7/0x150
> [ 2120.866641] [<ffffffffa1001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [ 2120.871850] [<ffffffffa108e64e>] SyS_kill+0xe/0x10
> [ 2120.877038] [<ffffffffa1800597>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [ 2316.020180] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2316.025395] Tasks blocked on level-0 rcu_node (CPUs 0-3): P1497
> [ 2316.030662] (detected by 2, t=260007 jiffies, g=78835, c=78834, q=0)
> [ 2316.035890] trinity-watchdo R running task 14336 1497 1496 0x00080000
> [ 2316.041153] ffff8804b94e3c88 ffffffffa17fa0b0 ffff8805010a1b40 ffff8800cd2db680
> [ 2316.046552] ffff8805010a1b40 ffff8804b94e3c78 0000000000000000 ffff8804b94e4000
> [ 2316.052039] 0000000000000001 0000000000000001 0000000000000002 ffff8804b94e3ca8
> [ 2316.057560] Call Trace:
> [ 2316.063066] [<ffffffffa17fa0b0>] ? preempt_schedule_irq+0x40/0xa0
> [ 2316.068594] [<ffffffffa17fa0b6>] preempt_schedule_irq+0x46/0xa0
> [ 2316.074034] [<ffffffffa1801129>] retint_kernel+0x1b/0x2d
> [ 2316.079419] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2316.084778] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2316.090041] [<ffffffffa10d675e>] ? lock_release+0x33e/0x700
> [ 2316.095219] [<ffffffffa108be29>] ? group_send_sig_info+0x79/0x120
> [ 2316.100359] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2316.105489] [<ffffffffa108c07b>] ? kill_pid_info+0x9b/0x150
> [ 2316.110584] [<ffffffffa108c098>] kill_pid_info+0xb8/0x150
> [ 2316.115594] [<ffffffffa108c000>] ? kill_pid_info+0x20/0x150
> [ 2316.120568] [<ffffffffa108c244>] SYSC_kill+0xf4/0x2b0
> [ 2316.125476] [<ffffffffa108c1ed>] ? SYSC_kill+0x9d/0x2b0
> [ 2316.130391] [<ffffffffa10d354b>] ? trace_hardirqs_on_caller+0x14b/0x1e0
> [ 2316.135348] [<ffffffffa10d35ed>] ? trace_hardirqs_on+0xd/0x10
> [ 2316.140351] [<ffffffffa1192ed3>] ? context_tracking_user_exit+0x13/0x20
> [ 2316.145399] [<ffffffffa1012db7>] ? syscall_trace_enter_phase1+0xf7/0x150
> [ 2316.150413] [<ffffffffa1001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [ 2316.155461] [<ffffffffa108e64e>] SyS_kill+0xe/0x10
> [ 2316.160499] [<ffffffffa1800597>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [ 2316.165609] trinity-watchdo R running task 14336 1497 1496 0x00080000
> [ 2316.170851] ffff8804b94e3c88 ffffffffa17fa0b0 ffff8805010a1b40 ffff8800cd2db680
> [ 2316.176131] ffff8805010a1b40 ffff8804b94e3c78 0000000000000000 ffff8804b94e4000
> [ 2316.181443] 0000000000000001 0000000000000001 0000000000000002 ffff8804b94e3ca8
> [ 2316.186735] Call Trace:
> [ 2316.192021] [<ffffffffa17fa0b0>] ? preempt_schedule_irq+0x40/0xa0
> [ 2316.197426] [<ffffffffa17fa0b6>] preempt_schedule_irq+0x46/0xa0
> [ 2316.202818] [<ffffffffa1801129>] retint_kernel+0x1b/0x2d
> [ 2316.208181] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2316.213541] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2316.218823] [<ffffffffa10d675e>] ? lock_release+0x33e/0x700
> [ 2316.224042] [<ffffffffa108be29>] ? group_send_sig_info+0x79/0x120
> [ 2316.229215] [<ffffffffa10ad5e1>] ? get_parent_ip+0x11/0x50
> [ 2316.234346] [<ffffffffa108c07b>] ? kill_pid_info+0x9b/0x150
> [ 2316.239491] [<ffffffffa108c098>] kill_pid_info+0xb8/0x150
> [ 2316.244538] [<ffffffffa108c000>] ? kill_pid_info+0x20/0x150
> [ 2316.249525] [<ffffffffa108c244>] SYSC_kill+0xf4/0x2b0
> [ 2316.254503] [<ffffffffa108c1ed>] ? SYSC_kill+0x9d/0x2b0
> [ 2316.259492] [<ffffffffa10d354b>] ? trace_hardirqs_on_caller+0x14b/0x1e0
> [ 2316.264565] [<ffffffffa10d35ed>] ? trace_hardirqs_on+0xd/0x10
> [ 2316.269594] [<ffffffffa1192ed3>] ? context_tracking_user_exit+0x13/0x20
> [ 2316.274681] [<ffffffffa1012db7>] ? syscall_trace_enter_phase1+0xf7/0x150
> [ 2316.279772] [<ffffffffa1001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [ 2316.284873] [<ffffffffa108e64e>] SyS_kill+0xe/0x10
> [ 2316.289981] [<ffffffffa1800597>] entry_SYSCALL_64_fastpath+0x12/0x6f
>
> some time later..
>
> [66859.981825] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [66859.987582] Tasks blocked on level-0 rcu_node (CPUs 0-3): P20837
> [66859.993375] (detected by 2, t=65013 jiffies, g=5129960, c=5129959, q=0)
> [66859.999146] trinity-c255 R running task 13728 20837 20581 0x00080002
> [66860.004980] ffff8804d1417cb8 ffffffffa17fa0b0 ffff8804e4dab680 ffff8804ba2951c0
> [66860.010831] ffff8804e4dab680 ffff8804d1417ca8 0000000000000000 ffff8804d1418000
> [66860.016666] ffff88045f49a440 ffff8805047a0620 ffff880480fdf528 ffff8804d1417cd8
> [66860.022423] Call Trace:
> [66860.028116] [<ffffffffa17fa0b0>] ? preempt_schedule_irq+0x40/0xa0
> [66860.033912] [<ffffffffa17fa0b6>] preempt_schedule_irq+0x46/0xa0
> [66860.039663] [<ffffffffa1801129>] retint_kernel+0x1b/0x2d
> [66860.045352] [<ffffffffa10d6299>] ? lock_acquire+0xd9/0x260
> [66860.051044] [<ffffffffa10ed3b1>] ? rcu_is_watching+0x1/0x60
> [66860.056725] [<ffffffffa1212abc>] ? dput+0x10c/0x390
> [66860.062377] [<ffffffffa12129d9>] ? dput+0x29/0x390
> [66860.068103] [<ffffffffa11fb6f1>] __fput+0x181/0x200
> [66860.073797] [<ffffffffa11fb576>] ? __fput+0x6/0x200
> [66860.079444] [<ffffffffa11fb7be>] ____fput+0xe/0x10
> [66860.085070] [<ffffffffa109dfdd>] task_work_run+0x8d/0xc0
> [66860.090699] [<ffffffffa107cef6>] do_exit+0x416/0xc60
> [66860.096304] [<ffffffffa1192ed3>] ? context_tracking_user_exit+0x13/0x20
> [66860.101978] [<ffffffffa107ec64>] do_group_exit+0x54/0xe0
> [66860.107684] [<ffffffffa107ed04>] SyS_exit_group+0x14/0x20
> [66860.113360] [<ffffffffa1800597>] entry_SYSCALL_64_fastpath+0x12/0x6f
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/