RE: [PATCH -next V7 0/7] riscv: Optimize function trace

From: David Laight
Date: Wed Feb 08 2023 - 17:29:50 EST


> > # Note: aligned to 8 bytes
> > addr-08 // Literal (first 32-bits) // patched to ops ptr
> > addr-04 // Literal (last 32-bits) // patched to ops ptr
> > addr+00 func: mv t0, ra
> We needn't "mv t0, ra" here because our "jalr" could work with t0 and
> won't affect ra. Let's do it in the trampoline code, and then we can
> save another word here.
> > addr+04 auipc t1, ftrace_caller
> > addr+08 jalr ftrace_caller(t1)

Is that some kind of 'load high' and 'add offset' pair?
I guess 64bit kernels guarantee to put all module code
within +-2G of the main kernel?

> Here is the call-site:
> # Note: aligned to 8 bytes
> addr-08 // Literal (first 32-bits) // patched to ops ptr
> addr-04 // Literal (last 32-bits) // patched to ops ptr
> addr+00 auipc t0, ftrace_caller
> addr+04 jalr ftrace_caller(t0)

Could you even do something like:
addr-n call ftrace-function
addr-n+x literals
addr+0 nop or jmp addr-n
addr+4 function_code
So that all the code executed when tracing is enabled
is before the label and only one 'nop' is in the body.
The called code can use the return address to find the
literals and then modify it to return to addr+4.
The code cost when trace is enabled is probably irrelevant
here - dominated by what happens later.
It probably isn't even worth aligning a 64bit constant.
Doing two reads probably won't be noticable.

What you do want to ensure is that the initial patch is
overwriting nop - just in case the gap isn't there.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)