Re: Instrumentation and RCU

From: Paul E. McKenney
Date: Mon Mar 09 2020 - 22:26:52 EST


On Tue, Mar 10, 2020 at 12:52:11AM +0100, Frederic Weisbecker wrote:
> On Mon, Mar 09, 2020 at 01:47:10PM -0700, Paul E. McKenney wrote:
> > On Mon, Mar 09, 2020 at 06:02:32PM +0100, Thomas Gleixner wrote:
> > > #3) RCU idle
> > >
> > > Being able to trace code inside RCU idle sections is very similar to
> > > the question raised in #1.
> > >
> > > Assume all of the instrumentation would be doing conditional RCU
> > > schemes, i.e.:
> > >
> > > if (rcuidle)
> > > ....
> > > else
> > > rcu_read_lock_sched()
> > >
> > > before invoking the actual instrumentation functions and of course
> > > undoing that right after it, that really begs the question whether
> > > it's worth it.
> > >
> > > Especially constructs like:
> > >
> > > trace_hardirqs_off()
> > > idx = srcu_read_lock()
> > > rcu_irq_enter_irqson();
> > > ...
> > > rcu_irq_exit_irqson();
> > > srcu_read_unlock(idx);
> > >
> > > if (user_mode)
> > > user_exit_irqsoff();
> > > else
> > > rcu_irq_enter();
> > >
> > > are really more than questionable. For 99.9999% of instrumentation
> > > users it's absolutely irrelevant whether this traces the interrupt
> > > disabled time of user_exit_irqsoff() or rcu_irq_enter() or not.
> > >
> > > But what's relevant is the tracer overhead which is e.g. inflicted
> > > with todays trace_hardirqs_off/on() implementation because that
> > > unconditionally uses the rcuidle variant with the scru/rcu_irq dance
> > > around every tracepoint.
> > >
> > > Even if the tracepoint sits in the ASM code it just covers about ~20
> > > low level ASM instructions more. The tracer invocation, which is
> > > even done twice when coming from user space on x86 (the second call
> > > is optimized in the tracer C-code), costs definitely way more
> > > cycles. When you take the scru/rcu_irq dance into account it's a
> > > complete disaster performance wise.
> >
> > Suppose that we had a variant of RCU that had about the same read-side
> > overhead as Preempt-RCU, but which could be used from idle as well as
> > from CPUs in the process of coming online or going offline? I have not
> > thought through the irq/NMI/exception entry/exit cases, but I don't see
> > why that would be problem.
> >
> > This would have explicit critical-section entry/exit code, so it would
> > not be any help for trampolines.
> >
> > Would such a variant of RCU help?
> >
> > Yeah, I know. Just what the kernel doesn't need, yet another variant
> > of RCU...
>
> I was thinking about having a tracing-specific implementation of RCU.
> Last week Steve told me that the tracing ring buffer has its own ad-hoc
> RCU implementation which schedule a thread on each CPU to complete a grace
> period (did I understand it right?). Of course such a flavour of RCU wouldn't
> be nice to nohz_full but surely we can arrange some tweaks for those who
> require strong isolation. I'm sure you're having a much better idea though.

Well, that too. Please see CONFIG_TASKS_RCU_RUDE in current
"dev" on -rcu. But yes, another is on its way...

Hey, it compiled, so it much be perfect, right? :-/

Thanx, Paul