Re: Tasks RCU vs Preempt RCU

From: Steven Rostedt
Date: Mon May 21 2018 - 21:05:30 EST

Next message: Vincent Chen: "Re: nds32 build failures"
Previous message: Dave Young: "Re: [PATCH] kdump: add default crashkernel reserve kernel config options"
In reply to: Joel Fernandes: "Re: Tasks RCU vs Preempt RCU"
Next in thread: Joel Fernandes: "Re: Tasks RCU vs Preempt RCU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 20 May 2018 12:18:46 -0700
Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:

> > There is no feasible way to know when a task is on a trampoline
> > without adding overhead that negates the speed up we receive by making
> > individual trampolines to begin with.
>
> Are you speaking of time overhead or space overhead, or both?

Both.

>
> Just thinking out loud and probably some food for thought..
>
> The rcu_read_lock/unlock primitive are extrememly fast, so I don't personally
> think there's a time hit.
>
> Could we get around the trampoline code == data issue by say using a
> multi-stage trampoline like so? :
>
> call func_tramp --> (static
> trampoline) (dynamic trampoline)
> rcu_read_lock() -------> set up stack
> call function_tracer()
> pop stack
> rcu_read_unlock() <------ ret
>
> I know there's probably more to it than this, but conceptually atleast, it

Yes, there is more to it. Think about why we create a dynamic
trampoline. It is to make a custom call per callback for a group of
functions being traced by that callback.

Now, if we make that static trampoline, we just lost the reason for the
dynamic one. How would that work if you have 5 different users of the
callbacks (and lets not forget about optimized kprobes)? How do you
jump from the static trampoline to the dynamic one with a single call?

> feels like all the RCU infrastructure is already there to handle preemption
> within a trampoline and it would be cool if the trampoline were as shown
> above for the dynamically allocated trampolines. Atleast I feel it will be
> faster than the pre-trampoline code that did the hash lookups / matching to
> call the right function callbacks, and could help eliminiate need for the
> RCU-tasks subsystem and its kthread then.

I don't see how the static trampoline would be able to call. Do we
create a static trampoline for every function that is traced and never
delete it? That's a lot of memory.

Also, we trace rcu_read_lock/unlock(), and I use that for a lot of
debugging. And we also need to deal with tracing code that RCU does not
watch, because function tracing does a lot of that too. I finally gave
up trying to have the stack tracer trace those locations, because it
was a serious game of whack a mole that would never end. I don't want
to give up full function tracing for the same reason.

>
> If you still feel its nots worth it, then that's okay too and clearly the
> RCU-tasks has benefits such as a simpler trampoline implementation..

If you are worried about making RCU simpler, we can go to my original
thought which was to make a home grown RCU like system that we can use,
as this has different requirements than normal RCU has. Like we don't
need a "lock" at all. We just need guaranteed quiescent points that we
make sure all tasks would go through before freeing the trampolines.
But it was decided to create a new flavor of RCU instead of doing that.

-- Steve

Next message: Vincent Chen: "Re: nds32 build failures"
Previous message: Dave Young: "Re: [PATCH] kdump: add default crashkernel reserve kernel config options"
In reply to: Joel Fernandes: "Re: Tasks RCU vs Preempt RCU"
Next in thread: Joel Fernandes: "Re: Tasks RCU vs Preempt RCU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]