Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation

From: Linus Torvalds
Date: Mon Apr 29 2019 - 14:15:27 EST


On Sun, Apr 28, 2019 at 10:38 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> For optimization reasons, if there's only a single user of a function
> it gets its own trampoline that sets up the call to its callback and
> calls that callback directly:

So this is the same issue as the static calls, and it has exactly the
same solution.

Which I already outlined once, and nobody wrote the code for.

So here's a COMPLETELY UNTESTED patch that only works (_if_ it works) for

(a) 64-bit

(b) SMP

but that's just because I've hardcoded the percpu segment handling.

It does *not* emulate the "call" in the BP handler itself, instead if
replace the %ip (the same way all the other BP handlers replace the
%ip) with a code sequence that just does

push %gs:bp_call_return
jmp *%gs:bp_call_target

after having filled in those per-cpu things.

The reason they are percpu is that after the %ip has been changed, the
target CPU goes its merry way, and doesn't wait for the text--poke
semaphore serialization. But since we have interrupts disabled on that
CPU, we know that *another* text poke won't be coming around and
changing the values.

THIS IS ENTIRELY UNTESTED! I've built it, and it at least seems to
build, although with warnings

arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqoff()+0x9: indirect jump found in RETPOLINE build
arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqon()+0x8: indirect jump found in RETPOLINE build
arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqoff()+0x9: sibling call from callable instruction with
modified stack frame
arch/x86/kernel/alternative.o: warning: objtool:
emulate_call_irqon()+0x8: sibling call from callable instruction with
modified stack frame

that will need the appropriate "ignore this case" annotations that I didn't do.

Do I expect it to work? No. I'm sure there's some silly mistake here,
but the point of the patch is to show it as an example, so that it can
actually be tested.

With this, it should be possible (under the text rewriting lock) to do

replace_call(callsite, newcallopcode, callsize, calltargettarget);

to do the static rewriting of the call at "callsite" to have the new
call target.

And again. Untested. But doesn't need any special code in the entry
path, and the concept is simple even if there are probably stupid bugs
just because it's entirely untested.

Oh, and did I mention that I didn't test this?

Linus
arch/x86/kernel/alternative.c | 54 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 9a79c7808f9c..92b59958cff3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -739,7 +739,11 @@ static void do_sync_core(void *info)
}

static bool bp_patching_in_progress;
-static void *bp_int3_handler, *bp_int3_addr;
+static void *bp_int3_handler_irqoff, *bp_int3_handler_irqon, *bp_int3_addr;
+static void *bp_int3_call_target, *bp_int3_call_return;
+
+static DEFINE_PER_CPU(void *, bp_call_return);
+static DEFINE_PER_CPU(void *, bp_call_target);

int poke_int3_handler(struct pt_regs *regs)
{
@@ -762,7 +766,22 @@ int poke_int3_handler(struct pt_regs *regs)
return 0;

/* set up the specified breakpoint handler */
- regs->ip = (unsigned long) bp_int3_handler;
+ regs->ip = (unsigned long) bp_int3_handler_irqon;
+
+ /*
+ * If we want an irqoff irq3 handler, and interrupts were
+ * on, we turn them off and use the special irqoff handler
+ * instead.
+ */
+ if (bp_int3_handler_irqoff) {
+ this_cpu_write(bp_call_target, bp_int3_call_target);
+ this_cpu_write(bp_call_return, bp_int3_call_return);
+
+ if (regs->flags & X86_EFLAGS_IF) {
+ regs->flags &= ~X86_EFLAGS_IF;
+ regs->ip = (unsigned long) bp_int3_handler_irqoff;
+ }
+ }

return 1;
}
@@ -792,7 +811,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
{
unsigned char int3 = 0xcc;

- bp_int3_handler = handler;
+ bp_int3_handler_irqon = handler;
bp_int3_addr = (u8 *)addr + sizeof(int3);
bp_patching_in_progress = true;

@@ -830,7 +849,36 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
* the writing of the new instruction.
*/
bp_patching_in_progress = false;
+ bp_int3_handler_irqoff = NULL;

return addr;
}

+extern asmlinkage void emulate_call_irqon(void);
+extern asmlinkage void emulate_call_irqoff(void);
+
+asm(
+ ".text\n"
+ ".global emulate_call_irqoff\n"
+ ".type emulate_call_irqoff, @function\n"
+ "emulate_call_irqoff:\n\t"
+ "push %gs:bp_call_return\n\t"
+ "sti\n\t"
+ "jmp *%gs:bp_call_target\n"
+ ".size emulate_call_irqoff, .-emulate_call_irqoff\n"
+
+ ".global emulate_call_irqon\n"
+ ".type emulate_call_irqon, @function\n"
+ "emulate_call_irqon:\n\t"
+ "push %gs:bp_call_return\n\t"
+ "jmp *%gs:bp_call_target\n"
+ ".size emulate_call_irqon, .-emulate_call_irqon\n"
+ ".previous\n");
+
+void replace_call(void *addr, const void *opcode, size_t len, void *target)
+{
+ bp_int3_call_target = target;
+ bp_int3_call_return = addr + len;
+ bp_int3_handler_irqoff = emulate_call_irqoff;
+ text_poke_bp(addr, opcode, len, emulate_call_irqon);
+}