Re: Help on kernel rcu bug

From: Paul E. McKenney
Date: Wed Dec 13 2017 - 11:44:34 EST

Next message: Steven Rostedt: "Re: [PATCH v2] rtc: Add tracepoints for RTC system"
Previous message: Mikhail Zaytsev: "Re: [PATCH] USB: serial: ark3116.c: Remove unused TIOCSSERIAL case from ioctl"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Dec 13, 2017 at 01:27:27PM +0800, æäæ wrote:
> Hi Paul,
>
> Thanks for your reply.
>
> We build the kernel based on Centos 3.10 kernel, and use the kernel in
> cloud computing.
> Yes, the kernel have CONFIG_NO_HZ_FULL=y and offload callback to every cpu.
>
> Should the ->nocb_q_count be equal the sum of ->nocb_head list , but
> ->nocb_head list with rcu_barrier_callback
> as the first element and only several callbacks on the list, all other cpus
> have already rcu_barrier_callback execute completed, rcuob thread of the
> problematic rcu_data cpu
> sleep in
> wait_event_interruptible(rdp->nocb_wq, rdp->nocb_head);
>
> Does the commit b58cc46c5f6b ("rcu: Don't offload callbacks unless
> specifically requested") can fix the problem?
> Because the problem occurred very rarely, and we have not got steps to
> reproduce manually.

There have been quite a few fixes to issues with callback offloading and
NO_HZ_FULL since 3.10. I don't know which of these have been applied
to CentOS. But yes, there have been some fixes for issues where an
rcuo kthread would fail to wake up, and I have no idea whether or not
the CentOS people took those fixes.

The reason I pointed you at commit b58cc46c5f6b is that it avoids the
problem, at least in some cases. Though I would be surprised if
that fix was not pulled into CentOS, given that a number of Red Hat
people attended this talk:

https://lwn.net/Articles/629742/

This talk has a list of commits that related to NO_HZ_FULL and callback
offloading.

Again, my recommendations are:

1. For CentOS issues, talk to CentOS people.

2. To avoid your problems with callback offloading and NO_HZ_FULL,
build your kernel with CONFIG_NO_HZ_FULL=n. Given that you
are doing cloud computing, this should work fine for your users.

Alternatively, you could compare the code in CentOS with current mainline
and then use "git blame" in order to identify fixes, but it is going to
be -way- easier to simply rebuild your kernel with CONFIG_NO_HZ_FULL=n.

Adding LKML on CC so that others who might be having the same problem
can find this.

Thanx, Paul

> Thanks
> Donghai.
>
>
>
>
>
> 2017-12-13 11:44 GMT+08:00 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>
> > On Wed, Dec 13, 2017 at 10:25:47AM +0800, æäæ wrote:
> > > Recently I have come across a rcu bug
> > > very same as :
> > > https://www.spinics.net/lists/netdev/msg290830.html
> > >
> > > Can you give a patch to solve the prior url problem
> >
> > That was a long time ago. My best guess is this one:
> >
> > b58cc46c5f6b ("rcu: Don't offload callbacks unless specifically requested")
> >
> > However, I am not familiar with CentOS, and I especially don't know
> > what they have and have not backported. So you really need to bring
> > this up with your contacts within CentOS.
> >
> > The very large ->nocb_q_count is expected when grace periods are not
> > completing. Are you also seeing RCU CPU stall warnings?
> >
> > You appear to have CONFIG_NO_HZ_FULL=y. Do you really need that? (Unless
> > you are doing certain specialized types of technical computing or
> > running CPU-bound real-time workloads, the answer is almost certainly
> > "no".) If not, and if you are willing to rebuild your kernel, I suggest
> > rebuilding with CONFIG_NO_HZ_FULL=n. That would make any number of
> > NO_HZ_FULL bugs that were present in 3.10 go away.
> >
> > Thanx, Paul
> >
> > > My Kernel is Centos with kernel 3.10
> > > The dmesg output is:
> > > crash> bt 41468
> > > PID: 41468 TASK: ffff8813993ad0f0 CPU: 0 COMMAND: "acjail"
> > > #0 [ffff8806cdfbbc50] __schedule at ffffffff8163265d
> > > #1 [ffff8806cdfbbcb8] schedule at ffffffff81632cf9
> > > #2 [ffff8806cdfbbcc8] schedule_timeout at ffffffff816309d9
> > > #3 [ffff8806cdfbbd70] wait_for_completion at ffffffff816330c6
> > > #4 [ffff8806cdfbbdd0] _rcu_barrier at ffffffff8111ef8b
> > > #5 [ffff8806cdfbbe10] rcu_barrier at ffffffff8111f095
> > > #6 [ffff8806cdfbbe20] netdev_run_todo at ffffffff815264df
> > > #7 [ffff8806cdfbbe78] rtnl_unlock at ffffffff815314ae
> > > #8 [ffff8806cdfbbe88] tun_chr_close at ffffffffa04537b2 [tun]
> > > #9 [ffff8806cdfbbea8] __fput at ffffffff811dacc9
> > > #10 [ffff8806cdfbbef0] ____fput at ffffffff811daf8e
> > > #11 [ffff8806cdfbbf00] task_work_run at ffffffff8109b437
> > > #12 [ffff8806cdfbbf30] do_notify_resume at ffffffff81014b12
> > > #13 [ffff8806cdfbbf50] int_signal at ffffffff8163dfbd
> > > RIP: 00007f0484540a57 RSP: 00007fffd7b21670 RFLAGS: 00000202
> > > RAX: 0000000000000000 RBX: 00007f048496abc0 RCX: ffffffffffffffff
> > > RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000007
> > > RBP: 00007fffd7b216a0 R8: 00007f048474d280 R9: 00007fffd7b21370
> > > R10: 00000000000000ea R11: 0000000000000202 R12: 0000000000000000
> > > R13: 00007fffd7b21a30 R14: 0000000000000000 R15: 0000000000000000
> > > ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
> > >
> > > and one cpu rcu_data is strange, with very large nocb_q_count:
> > > $16 = {
> > > completed = 1690601839,
> > > gpnum = 1690601840,
> > > passed_quiesce = true,
> > > qs_pending = true,
> > > beenonline = true,
> > > preemptible = false,
> > > mynode = 0xffffffff8199d800 <rcu_sched_state+512>,
> > > grpmask = 1,
> > > nxtlist = 0x0,
> > > nxttail = {0x0, 0x0, 0x0, 0x0},
> > > nxtcompleted = {0, 0, 0, 0},
> > > qlen_lazy = 0,
> > > qlen = 0,
> > > qlen_last_fqs_check = 4611686018427387903,
> > > *************overflow**************
> > > n_cbs_invoked = 0,
> > > n_nocbs_invoked = 3125697815 <(312)%20569-7815>,
> > > n_cbs_orphaned = 0,
> > > n_cbs_adopted = 0,
> > > n_force_qs_snap = 0,
> > > blimit = 10,
> > > dynticks = 0xffff881fff40de00,
> > > dynticks_snap = 353239209,
> > > dynticks_fqs = 14212500,
> > > offline_fqs = 0,
> > > n_rcu_pending = 1800174579,
> > > n_rp_qs_pending = 146251304,
> > > n_rp_report_qs = 236529497,
> > > n_rp_cb_ready = 0,
> > > n_rp_cpu_needs_gp = 117683,
> > > n_rp_gp_completed = 229379752,
> > > n_rp_gp_started = 22982115,
> > > n_rp_need_nothing = 1311165532,
> > > barrier_head = {
> > > next = 0xffff881e687c91f0,
> > > func = 0xffffffff8111d880 <rcu_barrier_callback>
> > > },
> > > nocb_head = 0xffff881fff40e0b8,
> > > nocb_tail = 0xffff880631907c50,
> > > nocb_q_count = {
> > > counter = 3635196 ******************ver large
> > **************
> > > },
> > > nocb_q_count_lazy = {
> > > counter = 13211 ******************ver large
> > > **************
> > > },
> > > nocb_p_count = 20,
> > > nocb_p_count_lazy = 2,
> > > nocb_wq = {
> > > lock = {
> > > {
> > > rlock = {
> > > raw_lock = {
> > > {
> > > head_tail = 2158657706 <(215)%20865-7706>,
> > > tickets = {
> > > head = 32938,
> > > tail = 32938
> > >
> > > Thanks
> > >
> > > Donghai.
> >
> >

Next message: Steven Rostedt: "Re: [PATCH v2] rtc: Add tracepoints for RTC system"
Previous message: Mikhail Zaytsev: "Re: [PATCH] USB: serial: ark3116.c: Remove unused TIOCSSERIAL case from ioctl"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]