Re: Performance impact of CONFIG_FUNCTION_TRACER

From: Sascha Hauer
Date: Thu Jul 14 2022 - 05:10:35 EST


On Tue, Jul 05, 2022 at 06:27:46PM -0400, Steven Rostedt wrote:
> On Tue, 5 Jul 2022 23:59:48 +0200
> Sascha Hauer <sha@xxxxxxxxxxxxxx> wrote:
>
> > >
> > > As I believe due to using a link register for function calls, ARM
> > > requires adding two 4 byte nops to every function where as x86 only
> > > adds a single 5 byte nop.
> > >
> > > Although nops are very fast (they should not be processed in the CPU's
> > > pipe line, but I don't know if that's true for every arch). It also
> > > affects instruction cache misses, as adding 8 bytes around the code
> > > will cause more cache misses than when they do not exist.
> >
> > Just digged around a bit and saw that on ARM it's not even a real nop.
> > The compiler emits:
> >
> > push {lr}
> > bl 8010e7c0 <__gnu_mcount_nc>
> >
> > Which is then turned into a nop by replacing the second instruction with
> >
> > add sp, sp, #4
> >
> > to bring the stack pointer back to its original value. This indeed must
> > be processed by the CPU pipeline. I wonder if that could be optimized by
> > replacing both instructions with a nop. I have no idea though if that's
> > feasible at all or if the overhead would even get smaller by that.
>
> The problem is that there's no easy way to do that, because a task
> could have been preempted after doing the 'push {lr}' and before the
> 'bl'. Thus, you create a race by changing either one to a nop first.
>
> I wonder if it would have been better to change the first one to a jump
> passed the second :-/

I gave this a try, but the performance was not better compared to the
stack push/pop operations we have now. I also tried to replace both
instructions with nops (mov r0, r0), still no better performance. I
guess we have to live with it then.

Sascha

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |