Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

From: Naveen N. Rao
Date: Thu Mar 03 2022 - 07:12:23 EST


Peter Zijlstra wrote:
On Wed, Mar 02, 2022 at 08:32:45PM +0100, Peter Zijlstra wrote:
I wonder if you also want to tighten up on_func_entry? Wouldn't the
above suggest something like:

Good question ;)
I noticed this yesterday, but held off on making changes so that I can think this through.


kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
bool *on_func_entry)
{
#ifdef PPC64_ELF_ABI_V2
unsigned long entry = ppc_function_entry((void *)addr) - addr;
*on_func_entry = !offset || offset == entry;
if (*on_func_entry)
offset = entry;
#else
*on_func_entry = !offset;
#endif
return (void *)(addr + offset);
}

This is more accurate and probably something we should do in the long term. The main issue I see today is userspace, specifically perf (and perhaps others too).

Historically, we have worked around the issue of probes on a function's global entry point not working by having userspace adjust the offset at which probes are placed. This works well if those object files have either the symbol table, or debuginfo capturing if functions have a separate local entry point. In the absence of those, we are left guessing and we chose to just offset all probes at function entry by 8 (GEP almost always has the same two instructions) so that perf "just works". This still works well for functions without a GEP since we expect to see the two ftrace instructions at function entry, so we are ok to probe after that. As an added bonus, this also allows uprobes to work, for the most part.

On the kernel side, we only implemented logic to adjust probe address if a function name was specified without an offset. This went for a toss once perf probe moved to using _text as the base symbol for kprobes though, and we weren't handling scenarios where addr+offset was provided. With the changes in this series, we can now adjust kprobe address across all those scenarios properly.

If we update perf to not pass an offset any more, then newer perf will stop working on older kernels. If we make the logic to determine function entry strict in the kernel, then we risk breaking existing userspace.

I'm not sure how best to address this.


One question though; the above seems to work for +0 or +8 (IIRC your
instructions are 4 bytes each and the GEP is 2 instructions).

But what do we want to happen for +4 ?

We don't want to change the behavior of probes at the second instruction in GEP. The thinking is that it allows the rare scenario (if at all) of wanting to catch indirect function calls, and/or cross-module function calls -- especially since we now promote probes at GEP to LEP. I frankly know of no such scenarios so far, but in any case, if the user is specifying an offset, they better know what they are asking for :)

For the same reason, we should allow kretprobe at +4.


Thanks,
Naveen