Re: There is a Tasks RCU stall warning

From: Paul E. McKenney
Date: Wed Apr 12 2017 - 10:48:12 EST


On Tue, Apr 11, 2017 at 04:11:38PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 11, 2017 at 04:04:45PM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 11, 2017 at 04:01:54PM -0700, Paul E. McKenney wrote:
> > > On Tue, Apr 11, 2017 at 06:15:30PM -0400, Steven Rostedt wrote:
> > > > On Tue, 11 Apr 2017 14:56:56 -0700
> > > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > > On Tue, Apr 11, 2017 at 05:49:53PM -0400, Steven Rostedt wrote:
> > > > > > On Tue, 11 Apr 2017 14:44:43 -0700
> > > > > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > >
> > > > > > > Works for me!
> > > > > > >
> > > > > > > Hopefully it will also work for your computer. :-)
> > > > > > >
> > > > > > > And whew! Glad to see that the stall warnings worked!
> > > > > >
> > > > > > Ah! but I think I found a bug in synchronize_rcu_tasks()!
> > > > > >
> > > > > > Calling schedule isn't good enough. For rcu_tasks to continue, the task
> > > > > > needs to schedule out. With my updated code, I just triggered:
> > > > > >
> > > > > > [ 196.276868] INFO: rcu_tasks detected stalls on tasks:
> > > > > > [ 196.284294] ffff8800c26f8040: .. nvcsw: 2/2 holdout: 1 idle_cpu: -1/1
> > > > > > [ 196.293175] event_benchmark R running task 30536 1127 2 0x10000000
> > > > > > [ 196.302746] Call Trace:
> > > > > > [ 196.307640] ? _raw_spin_unlock_irq+0x1f/0x50
> > > > > > [ 196.314453] __schedule+0x222/0x1210
> > > > > > [ 196.320476] ? pci_mmcfg_check_reserved+0xc0/0xc0
> > > > > > [ 196.327616] ? preempt_count_add+0xb7/0xf0
> > > > > > [ 196.334174] ? __asan_store8+0x15/0x70
> > > > > > [ 196.340384] schedule+0x57/0xe0
> > > > > > [ 196.345888] benchmark_event_kthread+0x2e/0x3c0
> > > > > > [ 196.352823] kthread+0x178/0x1d0
> > > > > > [ 196.358411] ? trace_benchmark_reg+0x80/0x80
> > > > > > [ 196.365073] ? kthread_create_on_node+0xa0/0xa0
> > > > > > [ 196.371999] ret_from_fork+0x2e/0x40
> > > > > >
> > > > > >
> > > > > > And here my benchmark called schedule(), but nothing scheduled it out,
> > > > > > and it still fails on rcu_tasks.
> > > > >
> > > > > Good point!
> > > > >
> > > > > Hmmmm... I cannot hook into rcu_note_context_switch() because that gets
> > > > > called for preemption as well as for voluntary context switches.
> > > >
> > > > If you pass in the "preempt" variable, it would work. It's false when
> > > > __schedule() was called by a "schedule()" directly, and true when
> > > > called by one of the preempt_schedule() functions.
> > >
> > > Like this? (Untested, but builds at least some of the time.)
> >
> > Not like that... :-/ Update on its way.
>
> Perhaps more like this. Started rcutorture on it, will see how it goes.

Do you need this patch? If so, I should do some more work on it to
eliminate the extra common-case branch on the scheduler fastpath.

Thanx, Paul

> ------------------------------------------------------------------------
>
> commit c9653896be9b79b7227e8b97710ad6475fc72dc1
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Tue Apr 11 15:50:41 2017 -0700
>
> rcu: Make non-preemptive schedule be Tasks RCU quiescent state
>
> Currently, a call to schedule() acts as a Tasks RCU quiescent state
> only if a context switch actually takes place. However, just the
> call to schedule() guarantees that the calling task has moved off of
> whatever tracing trampoline that it might have been one previously.
> This commit therefore plumbs schedule()'s "preempt" parameter into
> rcu_note_context_switch(), which then records the Tasks RCU quiescent
> state, but only if this call to schedule() was -not- due to a preemption.
>
> Suggested-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index e6146d0074f8..f531b29207da 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -363,15 +363,20 @@ static inline void rcu_init_nohz(void)
> #ifdef CONFIG_TASKS_RCU
> #define TASKS_RCU(x) x
> extern struct srcu_struct tasks_rcu_exit_srcu;
> -#define rcu_note_voluntary_context_switch(t) \
> +#define rcu_note_voluntary_context_switch_lite(t) \
> do { \
> - rcu_all_qs(); \
> if (READ_ONCE((t)->rcu_tasks_holdout)) \
> WRITE_ONCE((t)->rcu_tasks_holdout, false); \
> } while (0)
> +#define rcu_note_voluntary_context_switch(t) \
> + do { \
> + rcu_all_qs(); \
> + rcu_note_voluntary_context_switch_lite(t); \
> + } while (0)
> #else /* #ifdef CONFIG_TASKS_RCU */
> #define TASKS_RCU(x) do { } while (0)
> -#define rcu_note_voluntary_context_switch(t) rcu_all_qs()
> +#define rcu_note_voluntary_context_switch_lite(t) do { } while (0)
> +#define rcu_note_voluntary_context_switch(t) rcu_all_qs()
> #endif /* #else #ifdef CONFIG_TASKS_RCU */
>
> /**
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 5219be250f00..21ea52df651d 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -92,10 +92,11 @@ static inline void kfree_call_rcu(struct rcu_head *head,
> call_rcu(head, func);
> }
>
> -static inline void rcu_note_context_switch(void)
> -{
> - rcu_sched_qs();
> -}
> +#define rcu_note_context_switch(preempt) \
> + do { \
> + rcu_sched_qs(); \
> + rcu_note_voluntary_context_switch_lite(current); \
> + } while (0)
>
> /*
> * Take advantage of the fact that there is only one CPU, which
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 63a4e4cf40a5..04a2caf3ab8b 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -30,7 +30,7 @@
> #ifndef __LINUX_RCUTREE_H
> #define __LINUX_RCUTREE_H
>
> -void rcu_note_context_switch(void);
> +void rcu_note_context_switch(bool preempt);
> int rcu_needs_cpu(u64 basem, u64 *nextevt);
> void rcu_cpu_stall_reset(void);
>
> @@ -41,7 +41,7 @@ void rcu_cpu_stall_reset(void);
> */
> static inline void rcu_virt_note_context_switch(int cpu)
> {
> - rcu_note_context_switch();
> + rcu_note_context_switch(false);
> }
>
> void synchronize_rcu_bh(void);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f13fc4ab2f0d..92cf78fffd31 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -458,13 +458,15 @@ static void rcu_momentary_dyntick_idle(void)
> * and requires special handling for preemptible RCU.
> * The caller must have disabled interrupts.
> */
> -void rcu_note_context_switch(void)
> +void rcu_note_context_switch(bool preempt)
> {
> struct rcu_state *rsp;
>
> barrier(); /* Avoid RCU read-side critical sections leaking down. */
> trace_rcu_utilization(TPS("Start context switch"));
> rcu_sched_qs();
> + if (!preempt)
> + rcu_note_voluntary_context_switch_lite(current);
> rcu_preempt_note_context_switch();
> /* Load rcu_urgent_qs before other flags. */
> if (!smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs)))
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9c38e6be4f3e..653ea8cf89e4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3378,7 +3378,7 @@ static void __sched notrace __schedule(bool preempt)
> hrtick_clear(rq);
>
> local_irq_disable();
> - rcu_note_context_switch();
> + rcu_note_context_switch(preempt);
>
> /*
> * Make sure that signal_pending_state()->signal_pending() below