Re: [RCU] kernel hangs in wait_rcu_gp during suspend path

From: Paul E. McKenney
Date: Fri Dec 19 2014 - 19:25:44 EST


On Sat, Dec 20, 2014 at 12:25:57AM +0530, Arun KS wrote:
> Hi Paul,
>
> On Fri, Dec 19, 2014 at 1:35 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, Dec 18, 2014 at 09:52:28PM +0530, Arun KS wrote:
> >> Hi Paul,
> >>
> >> On Thu, Dec 18, 2014 at 12:54 AM, Paul E. McKenney
> >> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >> > On Tue, Dec 16, 2014 at 11:00:20PM +0530, Arun KS wrote:
> >> >> Hello,
> >> >>
> >> >> Adding some more info.
> >> >>
> >> >> Below is the rcu_data data structure corresponding to cpu4.
> >> >
> >> > This shows that RCU is idle. What was the state of the system at the
> >> > time you collected this data?
> >>
> >> System initiated a suspend sequence and currently at disable_nonboot_cpus().
> >> It has hotplugged 0, 1 and 2 successfully. And even successful in hot
> >> plugging cpu3.
> >> But while calling the CPU_POST_DEAD notifier for cpu3, another driver tried to
> >> unregister an atomic notifier. Which eventually calls syncronize_rcu()
> >> which hangs the suspend task.
> >>
> >> bt as follows,
> >> PID: 202 TASK: edcd2a00 CPU: 4 COMMAND: "kworker/u16:4"
> >> #0 [<c0a1f8c0>] (__schedule) from [<c0a1d054>]
> >> #1 [<c0a1d054>] (schedule_timeout) from [<c0a1f018>]
> >> #2 [<c0a1f018>] (wait_for_common) from [<c013c570>]
> >> #3 [<c013c570>] (wait_rcu_gp) from [<c014407c>]
> >> #4 [<c014407c>] (atomic_notifier_chain_unregister) from [<c06be62c>]
> >> #5 [<c06be62c>] (cpufreq_interactive_disable_sched_input) from [<c06bee1c>]
> >> #6 [<c06bee1c>] (cpufreq_governor_interactive) from [<c06b7724>]
> >> #7 [<c06b7724>] (__cpufreq_governor) from [<c06b9f74>]
> >> #8 [<c06b9f74>] (__cpufreq_remove_dev_finish) from [<c06ba3a4>]
> >> #9 [<c06ba3a4>] (cpufreq_cpu_callback) from [<c0a22674>]
> >> #10 [<c0a22674>] (notifier_call_chain) from [<c012284c>]
> >> #11 [<c012284c>] (__cpu_notify) from [<c01229dc>]
> >> #12 [<c01229dc>] (cpu_notify_nofail) from [<c0a0dd1c>]
> >> #13 [<c0a0dd1c>] (_cpu_down) from [<c0122b48>]
> >> #14 [<c0122b48>] (disable_nonboot_cpus) from [<c0168cd8>]
> >> #15 [<c0168cd8>] (suspend_devices_and_enter) from [<c0169018>]
> >> #16 [<c0169018>] (pm_suspend) from [<c01691e0>]
> >> #17 [<c01691e0>] (try_to_suspend) from [<c01396f0>]
> >> #18 [<c01396f0>] (process_one_work) from [<c0139db0>]
> >> #19 [<c0139db0>] (worker_thread) from [<c013efa4>]
> >> #20 [<c013efa4>] (kthread) from [<c01061f8>]
> >>
> >> But the other cores 4-7 are active. I can see them going to idle tasks
> >> coming out from idle because of interrupts, scheduling kworkers etc.
> >> So when I took the data, all the online cores(4-7) were in idle as
> >> shown below from runq data structures.
> >>
> >> ------start--------------
> >> crash> runq
> >> CPU 0 [OFFLINE]
> >>
> >> CPU 1 [OFFLINE]
> >>
> >> CPU 2 [OFFLINE]
> >>
> >> CPU 3 [OFFLINE]
> >>
> >> CPU 4 RUNQUEUE: c5439040
> >> CURRENT: PID: 0 TASK: f0c9d400 COMMAND: "swapper/4"
> >> RT PRIO_ARRAY: c5439130
> >> [no tasks queued]
> >> CFS RB_ROOT: c54390c0
> >> [no tasks queued]
> >>
> >> CPU 5 RUNQUEUE: c5447040
> >> CURRENT: PID: 0 TASK: f0c9aa00 COMMAND: "swapper/5"
> >> RT PRIO_ARRAY: c5447130
> >> [no tasks queued]
> >> CFS RB_ROOT: c54470c0
> >> [no tasks queued]
> >>
> >> CPU 6 RUNQUEUE: c5455040
> >> CURRENT: PID: 0 TASK: f0c9ce00 COMMAND: "swapper/6"
> >> RT PRIO_ARRAY: c5455130
> >> [no tasks queued]
> >> CFS RB_ROOT: c54550c0
> >> [no tasks queued]
> >>
> >> CPU 7 RUNQUEUE: c5463040
> >> CURRENT: PID: 0 TASK: f0c9b000 COMMAND: "swapper/7"
> >> RT PRIO_ARRAY: c5463130
> >> [no tasks queued]
> >> CFS RB_ROOT: c54630c0
> >> [no tasks queued]
> >> ------end--------------
> >>
> >> but one strange thing i can see is that rcu_read_lock_nesting for idle
> >> tasks running on cpu 5 and cpu 6 are set to 1.
> >>
> >> PID: 0 TASK: f0c9d400 CPU: 4 COMMAND: "swapper/4"
> >> rcu_read_lock_nesting = 0,
> >>
> >> PID: 0 TASK: f0c9aa00 CPU: 5 COMMAND: "swapper/5"
> >> rcu_read_lock_nesting = 1,
> >>
> >> PID: 0 TASK: f0c9ce00 CPU: 6 COMMAND: "swapper/6"
> >> rcu_read_lock_nesting = 1,
> >>
> >> PID: 0 TASK: f0c9b000 CPU: 7 COMMAND: "swapper/7"
> >> rcu_read_lock_nesting = 0,
> >>
> >> Does this means that the current grace period(suspend thread is
> >> waiting on) is getting extended infinitely?
> >
> > Indeed it does, good catch! Looks like someone entered an RCU read-side
> > critical section, then forgot to exit it, which would prevent grace
> > periods from ever completing. CONFIG_PROVE_RCU=y might be
> > helpful in tracking this down.
> >
> >> Also attaching the per_cpu rcu_data for online and offline cores.
> >
> > But these still look like there is no grace period in progress.
> >
> > Still, it would be good to try CONFIG_PROVE_RCU=y and see what it
> > shows you.
> >
> > Also, I am not seeing similar complaints about 3.10, so it is quite
> > possible that a recent change in the ARM-specific idle-loop code is
> > doing this to you. It might be well worth looking through recent
> > changes in this area, particularly if some older version works well
> > for you.
>
> Enabling CONFIG_PROVE_RCU also didn't help.
>
> But we figured out the problem. Thanks for your help. And need your
> suggestion in fixing.
>
> if we dump the irq_work_list,
>
> crash> irq_work_list
> PER-CPU DATA TYPE:
> struct llist_head irq_work_list;
> PER-CPU ADDRESSES:
> [0]: c53ff90c
> [1]: c540d90c
> [2]: c541b90c
> [3]: c542990c
> [4]: c543790c
> [5]: c544590c
> [6]: c545390c
> [7]: c546190c
> crash>
> crash> list irq_work.llnode -s irq_work.func -h c117f0b4
> c117f0b4
> func = 0xc0195aa8 <rsp_wakeup>
> c117f574
> func = 0xc0195aa8 <rsp_wakeup>
> crash>
>
> rsp_wakeup is pending in the cpu1's irq_work_list. And cpu1 is already
> hot-plugged out.
> All the later irq_work_queue calls returns because the work is already pending.
>
> When the issue happens, noticed that a hotplug is happening during
> early stages of boot(due to thermal), even before irq_work registers a
> cpu_notifer callback. And hence __irq_work_run() will not run as a
> part of CPU_DYING notifier.
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/kernel/irq_work.c?id=refs/tags/v3.10.63#n198
>
> and rsp_wakeup is pending from there.
>
> In first approach, we changed the
> device_initcall(irq_work_init_cpu_notifier) to earlyinit_initcall and
> the issue goes away. Because this makes sure that we have a cpu
> notifier before any hotplug happens.
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 55fcce6..5e58767 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -198,6 +198,6 @@ static __init int irq_work_init_cpu_notifier(void)
> register_cpu_notifier(&cpu_notify);
> return 0;
> }
> -device_initcall(irq_work_init_cpu_notifier);
> +early_initcall(irq_work_init_cpu_notifier);

I prefer this approach. Another alternative is to keep the device_initcall(),
but to do something more or less like:

get_online_cpus();
for_each_possible_cpu(cpu)
if (cpu_online(cpu))
your_notifier(&cpu_notify, CPU_ONLINE, cpu);
else
your_notifier(&cpu_notify, CPU_DEAD, cpu);
put_online_cpus();

The details will depend on how your notifier is structured.

If feasible, moving to early_initcall() is simpler. You could of course
move some of the code to an early_initcall() while leaving the rest at
device_initcall() time.

> #endif /* CONFIG_HOTPLUG_CPU */
>
>
> Another approach is to add syncronize_rcu() in cpu_down path. This
> way we makes sure that hotplug waits until a grace period gets over.
> This also fixes the problem.
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index c56b958..00bdd90 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -311,6 +311,11 @@ static int __ref _cpu_down(unsigned int cpu, int
> tasks_frozen)
> __func__, cpu);
> goto out_release;
> }
> +#ifdef CONFIG_PREEMPT
> + synchronize_sched();
> +#endif
> + synchronize_rcu();
> +

This would seriously slow down CPU hotplug for everyone, which is not
warranted to fix a single architecture.

So the earlier move to early_initcall() is much better.

Thanx, Paul

> smpboot_park_threads(cpu);
>
> err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
>
>
>
> Can you please suggest the rite approach?
>
> Thanks,
> Arun
>
>
>
>
>
> >
> > Thanx, Paul
> >
> >> Thanks,
> >> Arun
> >>
> >> >
> >> > Thanx, Paul
> >> >
> >> >> struct rcu_data {
> >> >> completed = 5877,
> >> >> gpnum = 5877,
> >> >> passed_quiesce = true,
> >> >> qs_pending = false,
> >> >> beenonline = true,
> >> >> preemptible = true,
> >> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> >> grpmask = 16,
> >> >> nxtlist = 0xedaaec00,
> >> >> nxttail = {0xc54366c4, 0xe84d350c, 0xe84d350c, 0xe84d350c},
> >> >> nxtcompleted = {4294967035, 5878, 5878, 5878},
> >> >> qlen_lazy = 105,
> >> >> qlen = 415,
> >> >> qlen_last_fqs_check = 0,
> >> >> n_cbs_invoked = 86323,
> >> >> n_nocbs_invoked = 0,
> >> >> n_cbs_orphaned = 0,
> >> >> n_cbs_adopted = 139,
> >> >> n_force_qs_snap = 0,
> >> >> blimit = 10,
> >> >> dynticks = 0xc5436758,
> >> >> dynticks_snap = 7582140,
> >> >> dynticks_fqs = 41,
> >> >> offline_fqs = 0,
> >> >> n_rcu_pending = 59404,
> >> >> n_rp_qs_pending = 5,
> >> >> n_rp_report_qs = 4633,
> >> >> n_rp_cb_ready = 32,
> >> >> n_rp_cpu_needs_gp = 41088,
> >> >> n_rp_gp_completed = 2844,
> >> >> n_rp_gp_started = 1150,
> >> >> n_rp_need_nothing = 9657,
> >> >> barrier_head = {
> >> >> next = 0x0,
> >> >> func = 0x0
> >> >> },
> >> >> oom_head = {
> >> >> next = 0x0,
> >> >> func = 0x0
> >> >> },
> >> >> cpu = 4,
> >> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> >> }
> >> >>
> >> >>
> >> >>
> >> >> Also pasting complete rcu_preempt_state.
> >> >>
> >> >>
> >> >>
> >> >> rcu_preempt_state = $9 = {
> >> >> node = {{
> >> >> lock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 3129850509,
> >> >> tickets = {
> >> >> owner = 47757,
> >> >> next = 47757
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 3735899821,
> >> >> owner_cpu = 4294967295,
> >> >> owner = 0xffffffff
> >> >> },
> >> >> gpnum = 5877,
> >> >> completed = 5877,
> >> >> qsmask = 0,
> >> >> expmask = 0,
> >> >> qsmaskinit = 240,
> >> >> grpmask = 0,
> >> >> grplo = 0,
> >> >> grphi = 7,
> >> >> grpnum = 0 '\000',
> >> >> level = 0 '\000',
> >> >> parent = 0x0,
> >> >> blkd_tasks = {
> >> >> next = 0xc117f378 <rcu_preempt_state+56>,
> >> >> prev = 0xc117f378 <rcu_preempt_state+56>
> >> >> },
> >> >> gp_tasks = 0x0,
> >> >> exp_tasks = 0x0,
> >> >> need_future_gp = {1, 0},
> >> >> fqslock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 0,
> >> >> tickets = {
> >> >> owner = 0,
> >> >> next = 0
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 3735899821,
> >> >> owner_cpu = 4294967295,
> >> >> owner = 0xffffffff
> >> >> }
> >> >> }},
> >> >> level = {0xc117f340 <rcu_preempt_state>},
> >> >> levelcnt = {1, 0, 0, 0, 0},
> >> >> levelspread = "\b",
> >> >> rda = 0xc115e6b0 <rcu_preempt_data>,
> >> >> call = 0xc01975ac <call_rcu>,
> >> >> fqs_state = 0 '\000',
> >> >> boost = 0 '\000',
> >> >> gpnum = 5877,
> >> >> completed = 5877,
> >> >> gp_kthread = 0xf0c9e600,
> >> >> gp_wq = {
> >> >> lock = {
> >> >> {
> >> >> rlock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 2160230594,
> >> >> tickets = {
> >> >> owner = 32962,
> >> >> next = 32962
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 3735899821,
> >> >> owner_cpu = 4294967295,
> >> >> owner = 0xffffffff
> >> >> }
> >> >> }
> >> >> },
> >> >> task_list = {
> >> >> next = 0xf0cd1f20,
> >> >> prev = 0xf0cd1f20
> >> >> }
> >> >> },
> >> >> gp_flags = 1,
> >> >> orphan_lock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 327685,
> >> >> tickets = {
> >> >> owner = 5,
> >> >> next = 5
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 3735899821,
> >> >> owner_cpu = 4294967295,
> >> >> owner = 0xffffffff
> >> >> },
> >> >> orphan_nxtlist = 0x0,
> >> >> orphan_nxttail = 0xc117f490 <rcu_preempt_state+336>,
> >> >> orphan_donelist = 0x0,
> >> >> orphan_donetail = 0xc117f498 <rcu_preempt_state+344>,
> >> >> qlen_lazy = 0,
> >> >> qlen = 0,
> >> >> onoff_mutex = {
> >> >> count = {
> >> >> counter = 1
> >> >> },
> >> >> wait_lock = {
> >> >> {
> >> >> rlock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 811479134,
> >> >> tickets = {
> >> >> owner = 12382,
> >> >> next = 12382
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 3735899821,
> >> >> owner_cpu = 4294967295,
> >> >> owner = 0xffffffff
> >> >> }
> >> >> }
> >> >> },
> >> >> wait_list = {
> >> >> next = 0xc117f4bc <rcu_preempt_state+380>,
> >> >> prev = 0xc117f4bc <rcu_preempt_state+380>
> >> >> },
> >> >> owner = 0x0,
> >> >> name = 0x0,
> >> >> magic = 0xc117f4a8 <rcu_preempt_state+360>
> >> >> },
> >> >> barrier_mutex = {
> >> >> count = {
> >> >> counter = 1
> >> >> },
> >> >> wait_lock = {
> >> >> {
> >> >> rlock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 0,
> >> >> tickets = {
> >> >> owner = 0,
> >> >> next = 0
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 3735899821,
> >> >> owner_cpu = 4294967295,
> >> >> owner = 0xffffffff
> >> >> }
> >> >> }
> >> >> },
> >> >> wait_list = {
> >> >> next = 0xc117f4e4 <rcu_preempt_state+420>,
> >> >> prev = 0xc117f4e4 <rcu_preempt_state+420>
> >> >> },
> >> >> owner = 0x0,
> >> >> name = 0x0,
> >> >> magic = 0xc117f4d0 <rcu_preempt_state+400>
> >> >> },
> >> >> barrier_cpu_count = {
> >> >> counter = 0
> >> >> },
> >> >> barrier_completion = {
> >> >> done = 0,
> >> >> wait = {
> >> >> lock = {
> >> >> {
> >> >> rlock = {
> >> >> raw_lock = {
> >> >> {
> >> >> slock = 0,
> >> >> tickets = {
> >> >> owner = 0,
> >> >> next = 0
> >> >> }
> >> >> }
> >> >> },
> >> >> magic = 0,
> >> >> owner_cpu = 0,
> >> >> owner = 0x0
> >> >> }
> >> >> }
> >> >> },
> >> >> task_list = {
> >> >> next = 0x0,
> >> >> prev = 0x0
> >> >> }
> >> >> }
> >> >> },
> >> >> n_barrier_done = 0,
> >> >> expedited_start = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_done = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_wrap = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_tryfail = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_workdone1 = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_workdone2 = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_normal = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_stoppedcpus = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_done_tries = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_done_lost = {
> >> >> counter = 0
> >> >> },
> >> >> expedited_done_exit = {
> >> >> counter = 0
> >> >> },
> >> >> jiffies_force_qs = 4294963917,
> >> >> n_force_qs = 4028,
> >> >> n_force_qs_lh = 0,
> >> >> n_force_qs_ngp = 0,
> >> >> gp_start = 4294963911,
> >> >> jiffies_stall = 4294966011,
> >> >> gp_max = 17,
> >> >> name = 0xc0d833ab "rcu_preempt",
> >> >> abbr = 112 'p',
> >> >> flavors = {
> >> >> next = 0xc117f2ec <rcu_bh_state+556>,
> >> >> prev = 0xc117f300 <rcu_struct_flavors>
> >> >> },
> >> >> wakeup_work = {
> >> >> flags = 3,
> >> >> llnode = {
> >> >> next = 0x0
> >> >> },
> >> >> func = 0xc0195aa8 <rsp_wakeup>
> >> >> }
> >> >> }
> >> >>
> >> >> Hope this helps.
> >> >>
> >> >> Thanks,
> >> >> Arun
> >> >>
> >> >>
> >> >> On Tue, Dec 16, 2014 at 11:59 AM, Arun KS <arunks.linux@xxxxxxxxx> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I dig little deeper to understand the situation.
> >> >> > All other cpus are in idle thread already.
> >> >> > As per my understanding, for the grace period to end, at-least one of
> >> >> > the following should happen on all online cpus,
> >> >> >
> >> >> > 1. a context switch.
> >> >> > 2. user space switch.
> >> >> > 3. switch to idle thread.
> >> >> >
> >> >> > In this situation, since all the other cores are already in idle, non
> >> >> > of the above are meet on all online cores.
> >> >> > So grace period is getting extended and never finishes. Below is the
> >> >> > state of runqueue when the hang happens.
> >> >> > --------------start------------------------------------
> >> >> > crash> runq
> >> >> > CPU 0 [OFFLINE]
> >> >> >
> >> >> > CPU 1 [OFFLINE]
> >> >> >
> >> >> > CPU 2 [OFFLINE]
> >> >> >
> >> >> > CPU 3 [OFFLINE]
> >> >> >
> >> >> > CPU 4 RUNQUEUE: c3192e40
> >> >> > CURRENT: PID: 0 TASK: f0874440 COMMAND: "swapper/4"
> >> >> > RT PRIO_ARRAY: c3192f20
> >> >> > [no tasks queued]
> >> >> > CFS RB_ROOT: c3192eb0
> >> >> > [no tasks queued]
> >> >> >
> >> >> > CPU 5 RUNQUEUE: c31a0e40
> >> >> > CURRENT: PID: 0 TASK: f0874980 COMMAND: "swapper/5"
> >> >> > RT PRIO_ARRAY: c31a0f20
> >> >> > [no tasks queued]
> >> >> > CFS RB_ROOT: c31a0eb0
> >> >> > [no tasks queued]
> >> >> >
> >> >> > CPU 6 RUNQUEUE: c31aee40
> >> >> > CURRENT: PID: 0 TASK: f0874ec0 COMMAND: "swapper/6"
> >> >> > RT PRIO_ARRAY: c31aef20
> >> >> > [no tasks queued]
> >> >> > CFS RB_ROOT: c31aeeb0
> >> >> > [no tasks queued]
> >> >> >
> >> >> > CPU 7 RUNQUEUE: c31bce40
> >> >> > CURRENT: PID: 0 TASK: f0875400 COMMAND: "swapper/7"
> >> >> > RT PRIO_ARRAY: c31bcf20
> >> >> > [no tasks queued]
> >> >> > CFS RB_ROOT: c31bceb0
> >> >> > [no tasks queued]
> >> >> > --------------end------------------------------------
> >> >> >
> >> >> > If my understanding is correct the below patch should help, because it
> >> >> > will expedite grace periods during suspend,
> >> >> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c
> >> >> >
> >> >> > But I wonder why it was not taken to stable trees. Can we take it?
> >> >> > Appreciate your help.
> >> >> >
> >> >> > Thanks,
> >> >> > Arun
> >> >> >
> >> >> > On Mon, Dec 15, 2014 at 10:34 PM, Arun KS <arunks.linux@xxxxxxxxx> wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> Here is the backtrace of the process hanging in wait_rcu_gp,
> >> >> >>
> >> >> >> PID: 247 TASK: e16e7380 CPU: 4 COMMAND: "kworker/u16:5"
> >> >> >> #0 [<c09fead0>] (__schedule) from [<c09fcab0>]
> >> >> >> #1 [<c09fcab0>] (schedule_timeout) from [<c09fe050>]
> >> >> >> #2 [<c09fe050>] (wait_for_common) from [<c013b2b4>]
> >> >> >> #3 [<c013b2b4>] (wait_rcu_gp) from [<c0142f50>]
> >> >> >> #4 [<c0142f50>] (atomic_notifier_chain_unregister) from [<c06b2ab8>]
> >> >> >> #5 [<c06b2ab8>] (cpufreq_interactive_disable_sched_input) from [<c06b32a8>]
> >> >> >> #6 [<c06b32a8>] (cpufreq_governor_interactive) from [<c06abbf8>]
> >> >> >> #7 [<c06abbf8>] (__cpufreq_governor) from [<c06ae474>]
> >> >> >> #8 [<c06ae474>] (__cpufreq_remove_dev_finish) from [<c06ae8c0>]
> >> >> >> #9 [<c06ae8c0>] (cpufreq_cpu_callback) from [<c0a0185c>]
> >> >> >> #10 [<c0a0185c>] (notifier_call_chain) from [<c0121888>]
> >> >> >> #11 [<c0121888>] (__cpu_notify) from [<c0121a04>]
> >> >> >> #12 [<c0121a04>] (cpu_notify_nofail) from [<c09ee7f0>]
> >> >> >> #13 [<c09ee7f0>] (_cpu_down) from [<c0121b70>]
> >> >> >> #14 [<c0121b70>] (disable_nonboot_cpus) from [<c016788c>]
> >> >> >> #15 [<c016788c>] (suspend_devices_and_enter) from [<c0167bcc>]
> >> >> >> #16 [<c0167bcc>] (pm_suspend) from [<c0167d94>]
> >> >> >> #17 [<c0167d94>] (try_to_suspend) from [<c0138460>]
> >> >> >> #18 [<c0138460>] (process_one_work) from [<c0138b18>]
> >> >> >> #19 [<c0138b18>] (worker_thread) from [<c013dc58>]
> >> >> >> #20 [<c013dc58>] (kthread) from [<c01061b8>]
> >> >> >>
> >> >> >> Will this patch helps here,
> >> >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c
> >> >> >>
> >> >> >> I couldn't really understand why it got struck in synchronize_rcu().
> >> >> >> Please give some pointers to debug this further.
> >> >> >>
> >> >> >> Below are the configs enable related to RCU.
> >> >> >>
> >> >> >> CONFIG_TREE_PREEMPT_RCU=y
> >> >> >> CONFIG_PREEMPT_RCU=y
> >> >> >> CONFIG_RCU_STALL_COMMON=y
> >> >> >> CONFIG_RCU_FANOUT=32
> >> >> >> CONFIG_RCU_FANOUT_LEAF=16
> >> >> >> CONFIG_RCU_FAST_NO_HZ=y
> >> >> >> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> >> >> >> CONFIG_RCU_CPU_STALL_VERBOSE=y
> >> >> >>
> >> >> >> Kernel version is 3.10.28
> >> >> >> Architecture is ARM
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Arun
> >> >>
> >> >
> >
> >> crash> struct rcu_data C54286B0
> >> struct rcu_data {
> >> completed = 2833,
> >> gpnum = 2833,
> >> passed_quiesce = false,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 8,
> >> nxtlist = 0x0,
> >> nxttail = {0xc54286c4, 0xc54286c4, 0xc54286c4, 0x0},
> >> nxtcompleted = {0, 4294967136, 4294967137, 4294967137},
> >> qlen_lazy = 0,
> >> qlen = 0,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 609,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 13,
> >> n_cbs_adopted = 0,
> >> n_force_qs_snap = 1428,
> >> blimit = 10,
> >> dynticks = 0xc5428758,
> >> dynticks_snap = 13206053,
> >> dynticks_fqs = 16,
> >> offline_fqs = 0,
> >> n_rcu_pending = 181,
> >> n_rp_qs_pending = 1,
> >> n_rp_report_qs = 21,
> >> n_rp_cb_ready = 0,
> >> n_rp_cpu_needs_gp = 0,
> >> n_rp_gp_completed = 22,
> >> n_rp_gp_started = 8,
> >> n_rp_need_nothing = 130,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 3,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >>
> >> crash> struct rcu_data C541A6B0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 4,
> >> nxtlist = 0x0,
> >> nxttail = {0xc541a6c4, 0xc541a6c4, 0xc541a6c4, 0x0},
> >> nxtcompleted = {0, 5877, 5878, 5878},
> >> qlen_lazy = 0,
> >> qlen = 0,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 61565,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 139,
> >> n_cbs_adopted = 100,
> >> n_force_qs_snap = 0,
> >> blimit = 10,
> >> dynticks = 0xc541a758,
> >> dynticks_snap = 13901017,
> >> dynticks_fqs = 75,
> >> offline_fqs = 0,
> >> n_rcu_pending = 16546,
> >> n_rp_qs_pending = 3,
> >> n_rp_report_qs = 4539,
> >> n_rp_cb_ready = 69,
> >> n_rp_cpu_needs_gp = 782,
> >> n_rp_gp_completed = 4196,
> >> n_rp_gp_started = 1739,
> >> n_rp_need_nothing = 5221,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 2,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >>
> >> crash> struct rcu_data C540C6B0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 2,
> >> nxtlist = 0x0,
> >> nxttail = {0xc540c6c4, 0xc540c6c4, 0xc540c6c4, 0x0},
> >> nxtcompleted = {4294967030, 5878, 5878, 5878},
> >> qlen_lazy = 0,
> >> qlen = 0,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 74292,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 100,
> >> n_cbs_adopted = 65,
> >> n_force_qs_snap = 0,
> >> blimit = 10,
> >> dynticks = 0xc540c758,
> >> dynticks_snap = 10753433,
> >> dynticks_fqs = 69,
> >> offline_fqs = 0,
> >> n_rcu_pending = 18350,
> >> n_rp_qs_pending = 6,
> >> n_rp_report_qs = 5009,
> >> n_rp_cb_ready = 50,
> >> n_rp_cpu_needs_gp = 915,
> >> n_rp_gp_completed = 4423,
> >> n_rp_gp_started = 1826,
> >> n_rp_need_nothing = 6127,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 1,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >> crash> struct rcu_data C53FE6B0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 1,
> >> nxtlist = 0x0,
> >> nxttail = {0xc53fe6c4, 0xc53fe6c4, 0xc53fe6c4, 0x0},
> >> nxtcompleted = {4294966997, 5875, 5876, 5876},
> >> qlen_lazy = 0,
> >> qlen = 0,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 123175,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 52,
> >> n_cbs_adopted = 0,
> >> n_force_qs_snap = 0,
> >> blimit = 10,
> >> dynticks = 0xc53fe758,
> >> dynticks_snap = 6330446,
> >> dynticks_fqs = 46,
> >> offline_fqs = 0,
> >> n_rcu_pending = 22529,
> >> n_rp_qs_pending = 3,
> >> n_rp_report_qs = 5290,
> >> n_rp_cb_ready = 279,
> >> n_rp_cpu_needs_gp = 740,
> >> n_rp_gp_completed = 2707,
> >> n_rp_gp_started = 1208,
> >> n_rp_need_nothing = 12305,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 0,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >>
> >
> >> crash> struct rcu_data c54366b0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 16,
> >> nxtlist = 0xedaaec00,
> >> nxttail = {0xc54366c4, 0xe84d350c, 0xe84d350c, 0xe84d350c},
> >> nxtcompleted = {4294967035, 5878, 5878, 5878},
> >> qlen_lazy = 105,
> >> qlen = 415,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 86323,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 0,
> >> n_cbs_adopted = 139,
> >> n_force_qs_snap = 0,
> >> blimit = 10,
> >> dynticks = 0xc5436758,
> >> dynticks_snap = 7582140,
> >> dynticks_fqs = 41,
> >> offline_fqs = 0,
> >> n_rcu_pending = 59404,
> >> n_rp_qs_pending = 5,
> >> n_rp_report_qs = 4633,
> >> n_rp_cb_ready = 32,
> >> n_rp_cpu_needs_gp = 41088,
> >> n_rp_gp_completed = 2844,
> >> n_rp_gp_started = 1150,
> >> n_rp_need_nothing = 9657,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 4,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >>
> >> crash> struct rcu_data c54446b0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 32,
> >> nxtlist = 0xcf9e856c,
> >> nxttail = {0xc54446c4, 0xcfb3050c, 0xcfb3050c, 0xcfb3050c},
> >> nxtcompleted = {0, 5878, 5878, 5878},
> >> qlen_lazy = 0,
> >> qlen = 117,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 36951,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 0,
> >> n_cbs_adopted = 0,
> >> n_force_qs_snap = 1428,
> >> blimit = 10,
> >> dynticks = 0xc5444758,
> >> dynticks_snap = 86034,
> >> dynticks_fqs = 46,
> >> offline_fqs = 0,
> >> n_rcu_pending = 49104,
> >> n_rp_qs_pending = 3,
> >> n_rp_report_qs = 2360,
> >> n_rp_cb_ready = 18,
> >> n_rp_cpu_needs_gp = 40106,
> >> n_rp_gp_completed = 1334,
> >> n_rp_gp_started = 791,
> >> n_rp_need_nothing = 4495,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 5,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >>
> >> crash> struct rcu_data c54526b0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 64,
> >> nxtlist = 0xe613d200,
> >> nxttail = {0xc54526c4, 0xe6fc9d0c, 0xe6fc9d0c, 0xe6fc9d0c},
> >> nxtcompleted = {0, 5878, 5878, 5878},
> >> qlen_lazy = 2,
> >> qlen = 35,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 34459,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 0,
> >> n_cbs_adopted = 0,
> >> n_force_qs_snap = 1428,
> >> blimit = 10,
> >> dynticks = 0xc5452758,
> >> dynticks_snap = 116840,
> >> dynticks_fqs = 47,
> >> offline_fqs = 0,
> >> n_rcu_pending = 48486,
> >> n_rp_qs_pending = 3,
> >> n_rp_report_qs = 2223,
> >> n_rp_cb_ready = 24,
> >> n_rp_cpu_needs_gp = 40101,
> >> n_rp_gp_completed = 1226,
> >> n_rp_gp_started = 789,
> >> n_rp_need_nothing = 4123,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 6,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >>
> >> crash> struct rcu_data c54606b0
> >> struct rcu_data {
> >> completed = 5877,
> >> gpnum = 5877,
> >> passed_quiesce = true,
> >> qs_pending = false,
> >> beenonline = true,
> >> preemptible = true,
> >> mynode = 0xc117f340 <rcu_preempt_state>,
> >> grpmask = 128,
> >> nxtlist = 0xdec32a6c,
> >> nxttail = {0xc54606c4, 0xe6fcf10c, 0xe6fcf10c, 0xe6fcf10c},
> >> nxtcompleted = {0, 5878, 5878, 5878},
> >> qlen_lazy = 1,
> >> qlen = 30,
> >> qlen_last_fqs_check = 0,
> >> n_cbs_invoked = 31998,
> >> n_nocbs_invoked = 0,
> >> n_cbs_orphaned = 0,
> >> n_cbs_adopted = 0,
> >> n_force_qs_snap = 1428,
> >> blimit = 10,
> >> dynticks = 0xc5460758,
> >> dynticks_snap = 57846,
> >> dynticks_fqs = 54,
> >> offline_fqs = 0,
> >> n_rcu_pending = 47502,
> >> n_rp_qs_pending = 2,
> >> n_rp_report_qs = 2142,
> >> n_rp_cb_ready = 37,
> >> n_rp_cpu_needs_gp = 40049,
> >> n_rp_gp_completed = 1223,
> >> n_rp_gp_started = 661,
> >> n_rp_need_nothing = 3390,
> >> barrier_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> oom_head = {
> >> next = 0x0,
> >> func = 0x0
> >> },
> >> cpu = 7,
> >> rsp = 0xc117f340 <rcu_preempt_state>
> >> }
> >
> >>
> >> rcu_preempt_state = $9 = {
> >> node = {{
> >> lock = {
> >> raw_lock = {
> >> {
> >> slock = 3129850509,
> >> tickets = {
> >> owner = 47757,
> >> next = 47757
> >> }
> >> }
> >> },
> >> magic = 3735899821,
> >> owner_cpu = 4294967295,
> >> owner = 0xffffffff
> >> },
> >> gpnum = 5877,
> >> completed = 5877,
> >> qsmask = 0,
> >> expmask = 0,
> >> qsmaskinit = 240,
> >> grpmask = 0,
> >> grplo = 0,
> >> grphi = 7,
> >> grpnum = 0 '\000',
> >> level = 0 '\000',
> >> parent = 0x0,
> >> blkd_tasks = {
> >> next = 0xc117f378 <rcu_preempt_state+56>,
> >> prev = 0xc117f378 <rcu_preempt_state+56>
> >> },
> >> gp_tasks = 0x0,
> >> exp_tasks = 0x0,
> >> need_future_gp = {1, 0},
> >> fqslock = {
> >> raw_lock = {
> >> {
> >> slock = 0,
> >> tickets = {
> >> owner = 0,
> >> next = 0
> >> }
> >> }
> >> },
> >> magic = 3735899821,
> >> owner_cpu = 4294967295,
> >> owner = 0xffffffff
> >> }
> >> }},
> >> level = {0xc117f340 <rcu_preempt_state>},
> >> levelcnt = {1, 0, 0, 0, 0},
> >> levelspread = "\b",
> >> rda = 0xc115e6b0 <rcu_preempt_data>,
> >> call = 0xc01975ac <call_rcu>,
> >> fqs_state = 0 '\000',
> >> boost = 0 '\000',
> >> gpnum = 5877,
> >> completed = 5877,
> >> gp_kthread = 0xf0c9e600,
> >> gp_wq = {
> >> lock = {
> >> {
> >> rlock = {
> >> raw_lock = {
> >> {
> >> slock = 2160230594,
> >> tickets = {
> >> owner = 32962,
> >> next = 32962
> >> }
> >> }
> >> },
> >> magic = 3735899821,
> >> owner_cpu = 4294967295,
> >> owner = 0xffffffff
> >> }
> >> }
> >> },
> >> task_list = {
> >> next = 0xf0cd1f20,
> >> prev = 0xf0cd1f20
> >> }
> >> },
> >> gp_flags = 1,
> >> orphan_lock = {
> >> raw_lock = {
> >> {
> >> slock = 327685,
> >> tickets = {
> >> owner = 5,
> >> next = 5
> >> }
> >> }
> >> },
> >> magic = 3735899821,
> >> owner_cpu = 4294967295,
> >> owner = 0xffffffff
> >> },
> >> orphan_nxtlist = 0x0,
> >> orphan_nxttail = 0xc117f490 <rcu_preempt_state+336>,
> >> orphan_donelist = 0x0,
> >> orphan_donetail = 0xc117f498 <rcu_preempt_state+344>,
> >> qlen_lazy = 0,
> >> qlen = 0,
> >> onoff_mutex = {
> >> count = {
> >> counter = 1
> >> },
> >> wait_lock = {
> >> {
> >> rlock = {
> >> raw_lock = {
> >> {
> >> slock = 811479134,
> >> tickets = {
> >> owner = 12382,
> >> next = 12382
> >> }
> >> }
> >> },
> >> magic = 3735899821,
> >> owner_cpu = 4294967295,
> >> owner = 0xffffffff
> >> }
> >> }
> >> },
> >> wait_list = {
> >> next = 0xc117f4bc <rcu_preempt_state+380>,
> >> prev = 0xc117f4bc <rcu_preempt_state+380>
> >> },
> >> owner = 0x0,
> >> name = 0x0,
> >> magic = 0xc117f4a8 <rcu_preempt_state+360>
> >> },
> >> barrier_mutex = {
> >> count = {
> >> counter = 1
> >> },
> >> wait_lock = {
> >> {
> >> rlock = {
> >> raw_lock = {
> >> {
> >> slock = 0,
> >> tickets = {
> >> owner = 0,
> >> next = 0
> >> }
> >> }
> >> },
> >> magic = 3735899821,
> >> owner_cpu = 4294967295,
> >> owner = 0xffffffff
> >> }
> >> }
> >> },
> >> wait_list = {
> >> next = 0xc117f4e4 <rcu_preempt_state+420>,
> >> prev = 0xc117f4e4 <rcu_preempt_state+420>
> >> },
> >> owner = 0x0,
> >> name = 0x0,
> >> magic = 0xc117f4d0 <rcu_preempt_state+400>
> >> },
> >> barrier_cpu_count = {
> >> counter = 0
> >> },
> >> barrier_completion = {
> >> done = 0,
> >> wait = {
> >> lock = {
> >> {
> >> rlock = {
> >> raw_lock = {
> >> {
> >> slock = 0,
> >> tickets = {
> >> owner = 0,
> >> next = 0
> >> }
> >> }
> >> },
> >> magic = 0,
> >> owner_cpu = 0,
> >> owner = 0x0
> >> }
> >> }
> >> },
> >> task_list = {
> >> next = 0x0,
> >> prev = 0x0
> >> }
> >> }
> >> },
> >> n_barrier_done = 0,
> >> expedited_start = {
> >> counter = 0
> >> },
> >> expedited_done = {
> >> counter = 0
> >> },
> >> expedited_wrap = {
> >> counter = 0
> >> },
> >> expedited_tryfail = {
> >> counter = 0
> >> },
> >> expedited_workdone1 = {
> >> counter = 0
> >> },
> >> expedited_workdone2 = {
> >> counter = 0
> >> },
> >> expedited_normal = {
> >> counter = 0
> >> },
> >> expedited_stoppedcpus = {
> >> counter = 0
> >> },
> >> expedited_done_tries = {
> >> counter = 0
> >> },
> >> expedited_done_lost = {
> >> counter = 0
> >> },
> >> expedited_done_exit = {
> >> counter = 0
> >> },
> >> jiffies_force_qs = 4294963917,
> >> n_force_qs = 4028,
> >> n_force_qs_lh = 0,
> >> n_force_qs_ngp = 0,
> >> gp_start = 4294963911,
> >> jiffies_stall = 4294966011,
> >> gp_max = 17,
> >> name = 0xc0d833ab "rcu_preempt",
> >> abbr = 112 'p',
> >> flavors = {
> >> next = 0xc117f2ec <rcu_bh_state+556>,
> >> prev = 0xc117f300 <rcu_struct_flavors>
> >> },
> >> wakeup_work = {
> >> flags = 3,
> >> llnode = {
> >> next = 0x0
> >> },
> >> func = 0xc0195aa8 <rsp_wakeup>
> >> }
> >>
> >>
> >
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/