Re: [PATCH] riscv: kprobe: Optimize kprobe with accurate atomicity

From: liaochang (A)
Date: Fri Jan 27 2023 - 22:53:30 EST


Hi, Guo Ren

在 2023/1/27 0:15, guoren@xxxxxxxxxx 写道:
> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
>
> The previous implementation was based on the stop_matchine mechanism,
> which reduced the speed of arm/disarm_kprobe. Using minimum ebreak
> instruction would get accurate atomicity.
>
> This patch removes the patch_text of riscv, which is based on
> stop_machine. Then riscv only reserved patch_text_nosync, and developers
> need to be more careful in dealing with patch_text atomicity.

In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair
AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute
in the instructions that will be modified, it is still need to stop other CPUs
via patch_text API, or you have any better solution to achieve the purpose?

Thanks.

>
> When CONFIG_RISCV_ISA_C=n, the ebreak could replace the whole
> instruction. When CONFIG_RISCV_ISA_C=y, the patch uses 16-bit length
> c.ebreak instruction, which may occupy the first part of the 32-bit
> instruction and leave half the rest of the broken instruction. Because
> ebreak could detour the flow to skip it, leaving it in the kernel text
> memory is okay.
>
> Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Guo Ren <guoren@xxxxxxxxxx>
> ---
> arch/riscv/include/asm/patch.h | 1 -
> arch/riscv/kernel/patch.c | 33 ------------------------------
> arch/riscv/kernel/probes/kprobes.c | 29 ++++++++++++++++++--------
> 3 files changed, 21 insertions(+), 42 deletions(-)
>
> diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h
> index 9a7d7346001e..2500782e6f5b 100644
> --- a/arch/riscv/include/asm/patch.h
> +++ b/arch/riscv/include/asm/patch.h
> @@ -7,6 +7,5 @@
> #define _ASM_RISCV_PATCH_H
>
> int patch_text_nosync(void *addr, const void *insns, size_t len);
> -int patch_text(void *addr, u32 insn);
>
> #endif /* _ASM_RISCV_PATCH_H */
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 765004b60513..8bd51ed8b806 100644
> --- a/arch/riscv/kernel/patch.c
> +++ b/arch/riscv/kernel/patch.c
> @@ -98,36 +98,3 @@ int patch_text_nosync(void *addr, const void *insns, size_t len)
> return ret;
> }
> NOKPROBE_SYMBOL(patch_text_nosync);
> -
> -static int patch_text_cb(void *data)
> -{
> - struct patch_insn *patch = data;
> - int ret = 0;
> -
> - if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
> - ret =
> - patch_text_nosync(patch->addr, &patch->insn,
> - GET_INSN_LENGTH(patch->insn));
> - atomic_inc(&patch->cpu_count);
> - } else {
> - while (atomic_read(&patch->cpu_count) <= num_online_cpus())
> - cpu_relax();
> - smp_mb();
> - }
> -
> - return ret;
> -}
> -NOKPROBE_SYMBOL(patch_text_cb);
> -
> -int patch_text(void *addr, u32 insn)
> -{
> - struct patch_insn patch = {
> - .addr = addr,
> - .insn = insn,
> - .cpu_count = ATOMIC_INIT(0),
> - };
> -
> - return stop_machine_cpuslocked(patch_text_cb,
> - &patch, cpu_online_mask);
> -}
> -NOKPROBE_SYMBOL(patch_text);
> diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
> index 475989f06d6d..27f8960c321c 100644
> --- a/arch/riscv/kernel/probes/kprobes.c
> +++ b/arch/riscv/kernel/probes/kprobes.c
> @@ -24,12 +24,18 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
> static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
> {
> unsigned long offset = GET_INSN_LENGTH(p->opcode);
> +#ifdef CONFIG_RISCV_ISA_C
> + u32 opcode = __BUG_INSN_16;
> +#else
> + u32 opcode = __BUG_INSN_32;
> +#endif
>
> p->ainsn.api.restore = (unsigned long)p->addr + offset;
>
> - patch_text(p->ainsn.api.insn, p->opcode);
> - patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
> - __BUG_INSN_32);
> + patch_text_nosync(p->ainsn.api.insn, &p->opcode, offset);
> + patch_text_nosync((void *)((unsigned long)(p->ainsn.api.insn) + offset),
> + &opcode, GET_INSN_LENGTH(opcode));
> +

I have submit a similar optimization for patching single-step slot [2].
And it is indeed safe to use compact breakpoint in single-step slot no matter
what type of patched instruction is.

Thanks.

> }
>
> static void __kprobes arch_prepare_simulate(struct kprobe *p)
> @@ -114,16 +120,23 @@ void *alloc_insn_page(void)
> /* install breakpoint in text */
> void __kprobes arch_arm_kprobe(struct kprobe *p)
> {
> - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32)
> - patch_text(p->addr, __BUG_INSN_32);
> - else
> - patch_text(p->addr, __BUG_INSN_16);
> +#ifdef CONFIG_RISCV_ISA_C
> + u32 opcode = __BUG_INSN_16;
> +#else
> + u32 opcode = __BUG_INSN_32;
> +#endif
> + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode));

Sounds good, but it will leave some RVI instruction truncated in kernel text,
i doubt kernel behavior depends on the rest of the truncated instruction, well,
it needs more strict testing to prove my concern :)

> }
>
> /* remove breakpoint from text */
> void __kprobes arch_disarm_kprobe(struct kprobe *p)
> {
> - patch_text(p->addr, p->opcode);
> +#ifdef CONFIG_RISCV_ISA_C
> + u32 opcode = __BUG_INSN_16;
> +#else
> + u32 opcode = __BUG_INSN_32;
> +#endif
> + patch_text_nosync(p->addr, &p->opcode, GET_INSN_LENGTH(opcode));
> }
>
> void __kprobes arch_remove_kprobe(struct kprobe *p)

[1] - https://lore.kernel.org/lkml/20230127130541.1250865-9-chenguokai17@xxxxxxxxxxxxxxxx/
[2] - https://lore.kernel.org/lkml/20220927022435.129965-1-liaochang1@xxxxxxxxxx/T/

--
BR,
Liao, Chang