Re: [PATCH] riscv: kprobe: Optimize kprobe with accurate atomicity

From: Guo Ren
Date: Thu Feb 16 2023 - 21:29:05 EST


On Thu, Feb 16, 2023 at 3:54 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote:
>
> Guo Ren <guoren@xxxxxxxxxx> writes:
>
> > On Tue, Jan 31, 2023 at 6:57 PM Andrea Parri <parri.andrea@xxxxxxxxx> wrote:
> >>
> >> > > It's the concurrent modification that I was referring to (removing
> >> > > stop_machine()). You're saying "it'll always work", I'm saying "I'm not
> >> > > so sure". :-) E.g., writing c.ebreak on an 32b insn. Can you say that
> >> > Software must ensure write c.ebreak on the head of an 32b insn.
> >> >
> >> > That means IFU only see:
> >> > - c.ebreak + broken/illegal insn.
> >> > or
> >> > - origin insn
> >> >
> >> > Even in the worst case, such as IFU fetches instructions one by one:
> >> > If the IFU gets the origin insn, it will skip the broken/illegal insn.
> >> > If the IFU gets the c.ebreak + broken/illegal insn, then an ebreak
> >> > exception is raised.
> >> >
> >> > Because c.ebreak would raise an exception, I don't see any problem.
> >>
> >> That's the problem, this discussion is:
> >>
> >> Reviewer: "I'm not sure, that's not written in our spec"
> >> Submitter: "I said it, it's called -accurate atomicity-"
> > I really don't see any hardware that could break the atomicity of this
> > c.ebreak scenario:
> > - c.ebreak on the head of 32b insn
> > - ebreak on an aligned 32b insn
> >
> > If IFU fetches with cacheline, all is atomicity.
> > If IFU fetches with 16bit one by one, the first c.ebreak would raise
> > an exception and skip the next broke/illegal instruction.
> > Even if IFU fetches without any sequence, the IDU must decode one by
> > one, right? The first half c.ebreak would protect and prevent the next
> > broke/illegal instruction. Speculative execution on broke/illegal
> > instruction won't cause any exceptions.
> >
> > It's a common issue, not a specific ISA issue.
> > 32b instruction A -> 16b ebreak + 16b broken/illegal -> 32b
> > instruction A. It's safe to transform.
>
> Waking up this thread again, now that Changbin has showed some interest
> from another thread [1].
>
> Guo, we can't really add your patches, and claim that they're generic,
> "works on all" RISC-V systems. While it might work for your I/D coherent
> system, that does not imply that it'll work on all platforms. RISC-V
> allows for implementations that are I/D incoherent, and here your
> IFU-implementations arguments do not hold. I'd really recommend to
> readup on [2].
Sorry, [2] isn't related to this patch.

This patch didn't have I/D incoherent problem because we broadcast the
IPI fence.i in patch_text_nosync.

Compared to the stop_machine version, there is a crazy nested IPI
broadcast cost.
stop_machine -> patch_text_nosync -> flush_icache_all
void flush_icache_all(void)
{
local_flush_icache_all();

if (IS_ENABLED(CONFIG_RISCV_SBI))
sbi_remote_fence_i(NULL);
else
on_each_cpu(ipi_remote_fence_i, NULL, 1);
}
EXPORT_SYMBOL(flush_icache_all);


>
> Now how could we move on with your patches? Get it in a spec, or fold
> the patches in as a Kconfig.socs-thing for the platforms where this is
> OK. What are you thoughts on the latter?

I didn't talk about I/D incoherent/coherent; what I say is the basic
size of the instruction element.
In an I/D cache system, why couldn't LSU store-half guarantee
atomicity for I-cache fetch? How I-cache could fetch only one byte of
that Store-half value?
We've assumed this guarantee in the riscv jump_label implementation,
so why not this patch couldn't?

>
>
> Björn
>
> [1] https://lore.kernel.org/linux-riscv/20230215034532.xs726l7mp6xlnkdf@M910t/
> [2] https://github.com/riscv/riscv-j-extension/blob/master/id-consistency-proposal.pdf



--
Best Regards
Guo Ren