Re: Instrumentation and RCU

From: Paul E. McKenney
Date: Tue Mar 10 2020 - 13:26:44 EST


On Tue, Mar 10, 2020 at 01:22:45PM -0400, Mathieu Desnoyers wrote:
> ----- On Mar 10, 2020, at 12:49 PM, paulmck paulmck@xxxxxxxxxx wrote:
>
> > On Tue, Mar 10, 2020 at 11:13:27AM -0400, Mathieu Desnoyers wrote:
> >>
> >>
> >> ----- On Mar 9, 2020, at 4:47 PM, paulmck paulmck@xxxxxxxxxx wrote:
> >> [...]
> >>
> >> >
> >> > Suppose that we had a variant of RCU that had about the same read-side
> >> > overhead as Preempt-RCU, but which could be used from idle as well as
> >> > from CPUs in the process of coming online or going offline? I have not
> >> > thought through the irq/NMI/exception entry/exit cases, but I don't see
> >> > why that would be problem.
> >> >
> >> > This would have explicit critical-section entry/exit code, so it would
> >> > not be any help for trampolines.
> >> >
> >> > Would such a variant of RCU help?
> >> >
> >> > Yeah, I know. Just what the kernel doesn't need, yet another variant
> >> > of RCU...
> >>
> >> Hi Paul,
> >>
> >> I think that before introducing yet another RCU flavor, it's important
> >> to take a step back and look at the tracer requirements first. If those
> >> end up being covered by currently available RCU flavors, then why add
> >> another ?
> >
> > Well, we have BPF requirements as well.
> >
> >> I can start with a few use-cases I have in mind. Others should feel free
> >> to pitch in:
> >>
> >> Tracing callsite context:
> >>
> >> 1) Thread context
> >>
> >> 1.1) Preemption enabled
> >>
> >> One tracepoint in this category is syscall enter/exit. We should introduce
> >> a variant of tracepoints relying on SRCU for this use-case so we can take
> >> page faults when fetching userspace data.
> >
> > Agreed, SRCU works fine for the page-fault case, as the read-side memory
> > barriers are in the noise compared to page-fault overhead. Back in
> > the day, there were light-weight system calls. Are all of these now
> > converted to VDSO or similar?
>
> There is a big difference between allowing page faults to happen, and expecting
> page faults to happen every time. I suspect many use-cases will end up having
> a fast-path which touches user-space data which is in the page cache, but
> may end up triggering page faults in rare occasions.
>
> Therefore, this might justify an SRCU which has low-overhead read-side.

OK, good to know, thank you!

Thanx, Paul