Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

From: Alexandre Ghiti
Date: Thu Feb 08 2024 - 08:23:30 EST


Hi Andrea,

On Thu, Feb 8, 2024 at 12:42 PM Andrea Parri <parri.andrea@xxxxxxxxx> wrote:
>
> > +static int __ftrace_modify_code(void *data)
> > +{
> > + struct ftrace_modify_param *param = data;
> > +
> > + if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> > + ftrace_modify_all_code(param->command);
> > + atomic_inc(&param->cpu_count);
>
> I stared at ftrace_modify_all_code() for a bit but honestly I don't see
> what prevents the ->cpu_count increment from being reordered before the
> insn write(s) (architecturally) now that you have removed the IPI dance:
> perhaps add an smp_wmb() right before the atomic_inc() (or promote this
> latter to a (void)atomic_inc_return_release()) and/or an inline comment
> saying why such reordering is not possible?

I did not even think of that, and it actually makes sense so I'll go
with what you propose: I'll replace atomic_inc() with
atomic_inc_return_release(). And I'll add the following comment if
that's ok with you:

"Make sure the patching store is effective *before* we increment the
counter which releases all waiting cpus"

>
>
> > + } else {
> > + while (atomic_read(&param->cpu_count) <= num_online_cpus())
> > + cpu_relax();
> > + smp_mb();
>
> I see that you've lifted/copied the memory barrier from patch_text_cb():
> what's its point? AFAIU, the barrier has no ordering effect on program
> order later insn fetches; perhaps the code was based on some legacy/old
> version of Zifencei? IAC, comments, comments, ... or maybe just remove
> that memory barrier?

Honestly, I looked at it one minute, did not understand its purpose
and said to myself "ok that can't hurt anyway, I may be missing
something".

FWIW, I see that arm64 uses isb() here. If you don't see its purpose,
I'll remove it (here and where I copied it).

>
>
> > + }
> > +
> > + local_flush_icache_all();
> > +
> > + return 0;
> > +}
>
> [...]
>
>
> > @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
> > if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
> > for (i = 0; ret == 0 && i < patch->ninsns; i++) {
> > len = GET_INSN_LENGTH(patch->insns[i]);
> > - ret = patch_text_nosync(patch->addr + i * len,
> > - &patch->insns[i], len);
> > + ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
> > }
> > atomic_inc(&patch->cpu_count);
> > } else {
> > @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
> > smp_mb();
> > }
> >
> > + local_flush_icache_all();
> > +
> > return ret;
> > }
> > NOKPROBE_SYMBOL(patch_text_cb);
>
> My above remarks/questions also apply to this function.
>
>
> On a last topic, although somehow orthogonal to the scope of this patch,
> I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
> correct: I can see why we may want (need to do) the local TLB flush be-
> fore returning from patch_{map,unmap}(), but does a local flush suffice?
> For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
> sequence in their unmapping stage (and apparently relying on "no caching
> of invalid ptes" in their mapping stage). Of course, "broadcasting" our
> (riscv's) TLB invalidations will necessary introduce some complexity...
>
> Thoughts?

To avoid remote TLBI, could we simply disable the preemption before
the first patch_map()? arm64 disables the irqs, but that seems
overkill to me, but maybe I'm missing something again?

Thanks for your comments Andrea,

Alex

>
> Andrea