Re: [PATCH] riscv: kprobe: Optimize kprobe with accurate atomicity

From: Guo Ren
Date: Tue Jan 31 2023 - 03:15:22 EST


On Tue, Jan 31, 2023 at 2:40 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote:
>
> Guo Ren <guoren@xxxxxxxxxx> writes:
>
> > On Mon, Jan 30, 2023 at 11:28 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote:
> >>
> >> Guo Ren <guoren@xxxxxxxxxx> writes:
> >>
> >> >> In the serie of RISCV OPTPROBES [1], it patches a long-jump instructions pair
> >> >> AUIPC/JALR in kernel text, so in order to ensure other CPUs does not execute
> >> >> in the instructions that will be modified, it is still need to stop other CPUs
> >> >> via patch_text API, or you have any better solution to achieve the purpose?
> >> > - The stop_machine is an expensive way all architectures should
> >> > avoid, and you could keep that in your OPTPROBES implementation files
> >> > with static functions.
> >> > - The stop_machine couldn't work with PREEMPTION, so your
> >> > implementation needs to work with !PREEMPTION.
> >>
> >> ...and stop_machine() with !PREEMPTION is broken as well, when you're
> >> replacing multiple instructions (see Mark's post at [1]). The
> >> stop_machine() dance might work when you're replacing *one* instruction,
> >> not multiple as in the RISC-V case. I'll expand on this in a comment in
> >> the OPTPROBES v6 series.
> >>
> >> >> > static void __kprobes arch_prepare_simulate(struct kprobe *p)
> >> >> > @@ -114,16 +120,23 @@ void *alloc_insn_page(void)
> >> >> > /* install breakpoint in text */
> >> >> > void __kprobes arch_arm_kprobe(struct kprobe *p)
> >> >> > {
> >> >> > - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32)
> >> >> > - patch_text(p->addr, __BUG_INSN_32);
> >> >> > - else
> >> >> > - patch_text(p->addr, __BUG_INSN_16);
> >> >> > +#ifdef CONFIG_RISCV_ISA_C
> >> >> > + u32 opcode = __BUG_INSN_16;
> >> >> > +#else
> >> >> > + u32 opcode = __BUG_INSN_32;
> >> >> > +#endif
> >> >> > + patch_text_nosync(p->addr, &opcode, GET_INSN_LENGTH(opcode));
> >> >>
> >> >> Sounds good, but it will leave some RVI instruction truncated in kernel text,
> >> >> i doubt kernel behavior depends on the rest of the truncated instruction, well,
> >> >> it needs more strict testing to prove my concern :)
> >> > I do this on purpose, and it doesn't cause any problems. Don't worry;
> >> > IFU hw must enforce the fetch sequence, and there is no way to execute
> >> > broken instructions even in the speculative execution path.
> >>
> >> This is stretching reality a bit much. ARMv8, e.g., has a chapter in the
> >> Arm ARM [2] Appendix B "Concurrent modification and execution of
> >> instructions" (CMODX). *Some* instructions can be replaced concurrently,
> >> and others cannot without caution. Assuming that that all RISC-V
> >> implementations can, is a stretch. RISC-V hasn't even specified the
> >> behavior of CMODX (which is problematic).
> > Here we only use one sw/sh instruction to store a 32bit/16bit aligned element:
> >
> > INSN_0 <- ebreak (16bit/32bit aligned)
> > INSN_1
> > INSN_2
> >
> > The ebreak would cause an exception which implies a huge fence here.
> > No machine could give a speculative execution for the ebreak path.
>
> It's the concurrent modification that I was referring to (removing
> stop_machine()). You're saying "it'll always work", I'm saying "I'm not
> so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that
Software must ensure write c.ebreak on the head of an 32b insn.

That means IFU only see:
- c.ebreak + broken/illegal insn.
or
- origin insn

Even in the worst case, such as IFU fetches instructions one by one:
If the IFU gets the origin insn, it will skip the broken/illegal insn.
If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak
exception is raised.

Because c.ebreak would raise an exception, I don't see any problem.


> will work on all RISC-V implementations? Do you have examples of
> hardware where it will work?
For the c.ebreak, it's natural. It's hard to make hardware
implementation get problems here.

>
>
> Björn



--
Best Regards
Guo Ren