Re: [PATCH v3 0/6] Static calls

From: Jann Horn
Date: Mon Feb 17 2020 - 16:10:58 EST


On Thu, Jan 10, 2019 at 9:52 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Thu, Jan 10, 2019 at 09:30:23PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 09, 2019 at 04:59:35PM -0600, Josh Poimboeuf wrote:
> > > With this version, I stopped trying to use text_poke_bp(), and instead
> > > went with a different approach: if the call site destination doesn't
> > > cross a cacheline boundary, just do an atomic write. Otherwise, keep
> > > using the trampoline indefinitely.
> >
> > > - Get rid of the use of text_poke_bp(), in favor of atomic writes.
> > > Out-of-line calls will be promoted to inline only if the call sites
> > > don't cross cache line boundaries. [Linus/Andy]
> >
> > Can we perserve why text_poke_bp() didn't work? I seem to have forgotten
> > again. The problem was poking the return address onto the stack from the
> > int3 handler, or something along those lines?
>
> Right, emulating a call instruction from the #BP handler is ugly,
> because you have to somehow grow the stack to make room for the return
> address. Personally I liked the idea of shifting the iret frame by 16
> bytes in the #DB entry code, but others hated it.
>
> So many bad-but-not-completely-unacceptable options to choose from.

Silly suggestion from someone who has skimmed the thread:

Wouldn't a retpoline-style trampoline solve this without needing
memory allocations? Let the interrupt handler stash the destination in
a percpu variable and clear IF in regs->flags. Something like:

void simulate_call(unsigned long target) {
__this_cpu_write(static_call_restore_if, (regs->flags & X86_EFLAGS_IF) != 0);
regs->flags &= ~X86_EFLAGS_IF;
__this_cpu_write(static_call_trampoline_source, regs->ip + 5);
__this_cpu_write(static_call_trampoline_target, target);
regs->ip = magic_static_call_trampoline;
}

magic_static_call_trampoline:
; set up return address for returning from target function
pushl PER_CPU_VAR(static_call_trampoline_source)
; set up retpoline-style return address
pushl PER_CPU_VAR(static_call_trampoline_target)
; restore flags if needed
cmp PER_CPU_VAR(static_call_restore_if), 0
je 1f
sti ; NOTE: percpu data must not be accessed past this point
1:
ret ; "return" to the call target

By using a return to implement the call, we don't need any scratch
registers for the call.