Re: tty^Wrcu/perf lockdep trace.

From: Peter Zijlstra
Date: Mon Oct 07 2013 - 07:24:42 EST


On Fri, Oct 04, 2013 at 05:23:48PM -0700, Paul E. McKenney wrote:
> The underlying problem is that perf is invoking call_rcu() with the
> scheduler locks held, but in NOCB mode, call_rcu() will with high
> probability invoke the scheduler -- which just might want to use its
> locks. The reason that call_rcu() needs to invoke the scheduler is
> to wake up the corresponding rcuo callback-offload kthread, which
> does the job of starting up a grace period and invoking the callbacks
> afterwards.
>
> One solution (championed on a related problem by Lai Jiangshan) is to

That's rcu_read_unlock_special(), right?

> simply defer the wakeup to some point where scheduler locks are no longer
> held. Since we don't want to unnecessarily incur the cost of such
> deferral, the task before us is threefold:
>
> 1. Determine when it is likely that a relevant scheduler lock is held.
>
> 2. Defer the wakeup in such cases.
>
> 3. Ensure that all deferred wakeups eventually happen, preferably
> sooner rather than later.
>
> We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> being held. This works because the relevant locks are always acquired
> with interrupts disabled. We may defer more often than needed, but that
> is at least safe.

Fair enough; do you feel the need for something more specific?

> The wakeup deferral is tracked via a new field in the per-CPU and
> per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
>
> This flag is checked by the RCU core processing. The __rcu_pending()
> function now checks this flag, which causes rcu_check_callbacks()
> to initiate RCU core processing at each scheduling-clock interrupt
> where this flag is set. Of course this is not sufficient because
> scheduling-clock interrupts are often turned off (the things we used to
> be able to count on!). So the flags are also checked on entry to any
> state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
> state and NO_HZ_FULL user-mode-execution state.

So RCU doesn't current differentiate between EQS for nr_running==1 and
nr_running==0?

> This approach should allow call_rcu() to be invoked regardless of what
> locks you might be holding, the key word being "should".

Agreed. Except it looks like you've inverted the deferred wakeup
condition :-)

> @@ -2314,6 +2323,22 @@ static int rcu_nocb_kthread(void *arg)
> return 0;
> }
>
> +/* Is a deferred wakeup of rcu_nocb_kthread() required? */
> +static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> +{
> + return ACCESS_ONCE(rdp->nocb_defer_wakeup);
> +}
> +
> +/* Do a deferred wakeup of rcu_nocb_kthread(). */
> +static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> +{
> + if (rcu_nocb_need_deferred_wakeup(rdp))

!rcu_nocb_need_deferred_wakeup() ?

> + return;
> + ACCESS_ONCE(rdp->nocb_defer_wakeup) = false;
> + wake_up(&rdp->nocb_wq);
> + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty"));
> +}
> +
> /* Initialize per-rcu_data variables for no-CBs CPUs. */
> static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/