Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions

From: Peter Zijlstra
Date: Tue Jun 11 2019 - 08:39:07 EST


On Tue, Jun 11, 2019 at 02:08:34PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 11, 2019 at 10:03:07AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 07, 2019 at 11:10:19AM -0700, Andy Lutomirski wrote:
>
> > > I am surely missing some kprobe context, but is it really safe to use
> > > this mechanism to replace more than one instruction?
> >
> > I'm not entirely up-to-scratch here, so Masami, please correct me if I'm
> > wrong.
> >
> > So what happens is that arch_prepare_optimized_kprobe() <-
> > copy_optimized_instructions() copies however much of the instruction
> > stream is required such that we can overwrite the instruction at @addr
> > with a 5 byte jump.
> >
> > arch_optimize_kprobe() then does the text_poke_bp() that replaces the
> > instruction @addr with int3, copies the rel jump address and overwrites
> > the int3 with jmp.
> >
> > And I'm thinking the problem is with something like:
> >
> > @addr: nop nop nop nop nop
> >
> > We copy out the nops into the trampoline, overwrite the first nop with
> > an INT3, overwrite the remaining nops with the rel addr, but oops,
> > another CPU can still be executing one of those NOPs, right?
> >
> > I'm thinking we could fix this by first writing INT3 into all relevant
> > instructions, which is going to be messy, given the current code base.
>
> Maybe not that bad; how's something like this?
>
> (completely untested)
>
> ---
> arch/x86/kernel/alternative.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 0d57015114e7..8f643dabea72 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -24,6 +24,7 @@
> #include <asm/tlbflush.h>
> #include <asm/io.h>
> #include <asm/fixmap.h>
> +#include <asm/insn.h>
>
> int __read_mostly alternatives_patched;
>
> @@ -849,6 +850,7 @@ static void do_sync_core(void *info)
>
> static bool bp_patching_in_progress;
> static void *bp_int3_handler, *bp_int3_addr;
> +static unsigned int bp_int3_length;
>
> int poke_int3_handler(struct pt_regs *regs)
> {
> @@ -867,7 +869,11 @@ int poke_int3_handler(struct pt_regs *regs)
> if (likely(!bp_patching_in_progress))
> return 0;
>
> - if (user_mode(regs) || regs->ip != (unsigned long)bp_int3_addr)
> + if (user_mode(regs))
> + return 0;
> +
> + if (regs->ip < (unsigned long)bp_int3_addr ||
> + regs->ip >= (unsigned long)bp_int3_addr + bp_int3_length)
> return 0;

Bugger, this isn't right. It'll jump to the beginning of the trampoline,
even if it is multiple instructions in, which would lead to executing
instructions twice, which would be BAD.

_maybe_, depending on what the slot looks like, we could do something
like:

offset = regs->ip - (unsigned long)bp_int3_addr;
regs->ip = bp_int3_handler + offset;

That is; jump into the slot at the same offset we hit the INT3, but this
is quickly getting yuck.

> /* set up the specified breakpoint handler */
> @@ -900,9 +906,12 @@ NOKPROBE_SYMBOL(poke_int3_handler);
> void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> {
> unsigned char int3 = 0xcc;
> + void *kaddr = addr;
> + struct insn insn;
>
> bp_int3_handler = handler;
> bp_int3_addr = (u8 *)addr + sizeof(int3);
> + bp_int3_length = len - sizeof(int3);
> bp_patching_in_progress = true;
>
> lockdep_assert_held(&text_mutex);
> @@ -913,7 +922,14 @@ void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> */
> smp_wmb();
>
> - text_poke(addr, &int3, sizeof(int3));
> + do {
> + kernel_insn_init(&insn, kaddr, MAX_INSN_SIZE);
> + insn_get_length(&insn);
> +
> + text_poke(kaddr, &int3, sizeof(int3));
> +
> + kaddr += insn.length;
> + } while (kaddr < addr + len);
>
> on_each_cpu(do_sync_core, NULL, 1);
>