Now, this was no easy task. We needed to add a section to every object
file with a list of pointers to the call sites to mcount. The idea I came
up with was to make a tmp.s file for every object just after it is compiled.
This tmp.s would then be compiled and relinked into the original object.
The tmp.s file would have something like:
.section __mcount_loc,"a",@progbits
.quad location_of_mcount1
.quad location_of_mcount2
(etc)
By running objdump on the object file we can find the offsets into the
sections that the functions are called.
For example, looking at hrtimer.o:
Disassembly of section .text:
0000000000000000 <hrtimer_init_sleeper>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <hrtimer_init_sleeper+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
the '5' in the '5: R_X86_64_PC32' is the offset that the mcount relocation
is to be done for the call site. This offset is from the .text section,
and not necessarily, from the function. If we look further we see:
000000000000001e <ktime_add_safe>:
1e: 55 push %rbp
1f: 48 89 e5 mov %rsp,%rbp
22: e8 00 00 00 00 callq 27 <ktime_add_safe+0x9>
23: R_X86_64_PC32 mcount+0xfffffffffffffffc
This mcount call site is 0x23 from the .text section, and obviously
not from the ktime_add_safe.
If we make a tmp.s that has the following:
.section __mcount_loc,"a",@progbits
.quad hrtimer_init_sleeper + 0x5
.quad hrtimer_init_sleeper + 0x23
We have a section with the locations of these two call sites. After the final
linking, they will point to the actual address used.
All that would need to be done is:
gcc -c tmp.s -o tmp.o
ld -r tmp.o hrtimer.o -o tmp_hrtime.o
mv tmp_hrtimer.o hrtimer.o
Easy as that! Not quite. What happens if that first function in the
section is a static function? That is, the symbol for the function
is local to the object. If for some reason hrtimer_init_sleeper is static,
the tmp_hrtimer.o would have two symbols for hrtimer_init_sleeper.
One local and one global.
But we can be even more evil with this idea. We can do crazy things
with objcopy to solve it for us.
objcopy --globalize-symbol hrtimer_init_sleeper hrtimer.o tmp_hrtimer.o
Now the hrtimer_init_sleeper would be global for linking.
ld -r tmp_hrtimer.o tmp.o -o tmp2_hrtimer.o
Now the tmp.o could use the same global hrtimer_init_sleeper symbol.
But we have tmp2_hritmer.o that has the tmp.o and tmp_hrtimer.o symbols,
but we cant just blindly convert local symbols to globals.
The solution is simply put it back to local.
objcopy --localize-symbol hrtimer_init_sleeper tmp2_hrtimer.o hrtimer.o
Now our hrtimer.o file has our __mcount_loc section and the
reference to hrtimer_init_sleeper will be resolved.
This is a bit complex to do in shell scripting and Makefiles, so I wrote
a well documented recordmcount.pl perl script, that will do the above
all in one place.
With this new update, we can work to kill that kernel thread "ftraced"!
This patch set ports to x86_64 and i386, the other archs will still use
the daemon until they are converted over.
I tested this on both x86_64 and i386 with and without CONFIG_RELOCATE
set.