Re: Instrumentation and RCU

From: Frederic Weisbecker
Date: Mon Mar 09 2020 - 19:52:19 EST


On Mon, Mar 09, 2020 at 01:47:10PM -0700, Paul E. McKenney wrote:
> On Mon, Mar 09, 2020 at 06:02:32PM +0100, Thomas Gleixner wrote:
> > #3) RCU idle
> >
> > Being able to trace code inside RCU idle sections is very similar to
> > the question raised in #1.
> >
> > Assume all of the instrumentation would be doing conditional RCU
> > schemes, i.e.:
> >
> > if (rcuidle)
> > ....
> > else
> > rcu_read_lock_sched()
> >
> > before invoking the actual instrumentation functions and of course
> > undoing that right after it, that really begs the question whether
> > it's worth it.
> >
> > Especially constructs like:
> >
> > trace_hardirqs_off()
> > idx = srcu_read_lock()
> > rcu_irq_enter_irqson();
> > ...
> > rcu_irq_exit_irqson();
> > srcu_read_unlock(idx);
> >
> > if (user_mode)
> > user_exit_irqsoff();
> > else
> > rcu_irq_enter();
> >
> > are really more than questionable. For 99.9999% of instrumentation
> > users it's absolutely irrelevant whether this traces the interrupt
> > disabled time of user_exit_irqsoff() or rcu_irq_enter() or not.
> >
> > But what's relevant is the tracer overhead which is e.g. inflicted
> > with todays trace_hardirqs_off/on() implementation because that
> > unconditionally uses the rcuidle variant with the scru/rcu_irq dance
> > around every tracepoint.
> >
> > Even if the tracepoint sits in the ASM code it just covers about ~20
> > low level ASM instructions more. The tracer invocation, which is
> > even done twice when coming from user space on x86 (the second call
> > is optimized in the tracer C-code), costs definitely way more
> > cycles. When you take the scru/rcu_irq dance into account it's a
> > complete disaster performance wise.
>
> Suppose that we had a variant of RCU that had about the same read-side
> overhead as Preempt-RCU, but which could be used from idle as well as
> from CPUs in the process of coming online or going offline? I have not
> thought through the irq/NMI/exception entry/exit cases, but I don't see
> why that would be problem.
>
> This would have explicit critical-section entry/exit code, so it would
> not be any help for trampolines.
>
> Would such a variant of RCU help?
>
> Yeah, I know. Just what the kernel doesn't need, yet another variant
> of RCU...
>

I was thinking about having a tracing-specific implementation of RCU.
Last week Steve told me that the tracing ring buffer has its own ad-hoc
RCU implementation which schedule a thread on each CPU to complete a grace
period (did I understand it right?). Of course such a flavour of RCU wouldn't
be nice to nohz_full but surely we can arrange some tweaks for those who
require strong isolation. I'm sure you're having a much better idea though.