Re: tty^Wrcu/perf lockdep trace.

From: Peter Zijlstra
Date: Mon Oct 07 2013 - 04:42:57 EST


On Sat, Oct 05, 2013 at 03:03:11PM -0700, Paul E. McKenney wrote:
> In theory, we could do that. But in practice, what would wake us up
> when the CPUs go non-idle?
>
> 1. We could do a wakeup on the idle-to-non-idle transition. That
> would increase idle-to-non-idle latency, defeating the purpose
> of rcu_nocb_poll=y. Plus there are workloads that enter and
> exit idle extremely quickly, which would not be good for either
> perforrmance, scalability, or energy efficiency.
>
> 2. We could have some other thread poll all the CPUs for activity,
> for example, the RCU grace-period kthreads. This might actually
> work, but there are some really ugly races involving CPUs becoming
> active just long enough to post a callback, going to sleep,
> with no other RCU activity in the system. This could easily
> result in a system hang.
>
> 3. We could post a timeout to check for the corresponding CPU
> being idle, but that just transfers the wakeups from idle from
> the rcuo kthreads to the other CPUs.
>
> 4. I remove rcu_nocb_poll and see if anyone complains. That doesn't
> solve the deadlock problem, but it does simplify RCU a bit. ;-)
>
> Other thoughts?

So we already move all the nocb rcuo threads over to the timekeeping
cpu, right? Giving you n threads to wake and/or poll and that's
expensive.

So why doesn't the time-keeping cpu, which is awake when at least one of
the nocb cpus is awake, not poll the nocb cpus their call list?

Arguably you don't want to do that from the old scheduler tick interrupt
or softirq context thingy, but by using a kthread but you've already got
all that around.

At that point; you've got a single kthread periodically being woken by
the scheduler timer interrupt -- which still goes away when the entire
machine goes idle -- which would do something like:


for_each_cpu(cpu, nocb_cpus_mask) {
if (!list_empty_careful(&per_cpu(rcu_state, cpu)->callbacks))
advance_cpu_callbacks(cpu);
}


That fully preserves the !NOCB state of affairs while also dealing with
the NOCB stuff. And the single remote read only gets really expensive
once you go _very_ large or once the cpu in question actually touched
the cacheline and moved it into exclusive mode due to writing to it; at
which point you've saved yourself a wakeup and we're still faster.

It automatically deals with the full idle case, it basically gives you
'poll' behaviour for nr_running==1 and to me appears as the simplest and
most straight fwd extension of the RCU model.

More importantly it does away with that wakeup that so often happens on
nocb cpus. Although, rereading your email, I get the impression we do
this wakeup even on !nocb cpus when CONFIG_NOCB=y, which seems another
undesired feature.


Maybe you've already thought of this and there's a very good reason
things aren't like this; but like said, I've been away for a little
while and need to catch up a bit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/