Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption

From: Guo Ren
Date: Sun Jan 29 2023 - 00:39:44 EST


On Sat, Jan 28, 2023 at 6:00 PM Guo Ren <guoren@xxxxxxxxxx> wrote:
>
> On Thu, Jan 12, 2023 at 8:05 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > On Wed, Jan 11, 2023 at 09:22:09PM +0800, Guo Ren wrote:
> > > On Tue, Jan 10, 2023 at 1:20 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > >
> > > > On Sat, Jan 07, 2023 at 08:35:43AM -0500, guoren@xxxxxxxxxx wrote:
> > > > > From: Andy Chiu <andy.chiu@xxxxxxxxxx>
> > > > >
> > > > > In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
> > > > > forming a jump that jumps to an address over 4K. This may cause errors
> > > > > if we want to enable kernel preemption and remove dependency from
> > > > > patching code with stop_machine(). For example, if a task was switched
> > > > > out on auipc. And, if we changed the ftrace function before it was
> > > > > switched back, then it would jump to an address that has updated 11:0
> > > > > bits mixing with previous XLEN:12 part.
> > > > >
> > > > > p: patched area performed by dynamic ftrace
> > > > > ftrace_prologue:
> > > > > p| REG_S ra, -SZREG(sp)
> > > > > p| auipc ra, 0x? ------------> preempted
> > > > > ...
> > > > > change ftrace function
> > > > > ...
> > > > > p| jalr -?(ra) <------------- switched back
> > > > > p| REG_L ra, -SZREG(sp)
> > > > > func:
> > > > > xxx
> > > > > ret
> > > >
> > > > What happens on SMP but not !PREEMPTION; can't a CPU be in the middle of this
> > > > while you're patching the sequence?
> > > Yes, when PREEMPTION, a timer interrupt between auipc & jalr may cause
> > > context_switch. And riscv uses stop_machine for patch_text. Then, we
> > > may modify auipc part, but only execute the jalr part when return.
> >
> > Please re-read my question; "!PREEMPTION" means "NOT PREEMPTION".
> >
> > Ignore preeemption entirely and assume two CPUs X and Y are running code
> > concurrently. Assume CPU X is in the ftrace prologue, and CPU Y is patching
> > that prologue while CPU X is executing it.
> >
> > Is that prevented somehow? If not, what happens in that case?
> >
> > At the very least you can have exactly the same case as on a preemptible kernel
> > (and in a VM, the hypervisor can preempt the guest ata arbitrary times),
> > becuase CPU X could end up executing a mixture of the old and new instructions.
> >
> > More generally, if you don't have strong rules about concurrent modification
> > and execution of instructions, it may not be safe to modify and instruction as
> > it is being executed (e.g. if the CPU's instruction fetches aren't atomic).
> >
> > > > Do you have any guarantee as to the atomicity and ordering of instruction
> > > > fetches?
> > > Not yet. If the region is short, we could use nop + jalr pair instead.
> >
> > Ok, so as above I do not understand how this is safe. Maybe I am missing
> > something, but if you don't have a guarantee as to ordering I don't see how you
> > can safely patch this even if you have atomicity of each instruction update.
> >
> > Note that if you don't have atomicity of instruction fetches you *cannot*
> > safely concurrently modify and execute instructions.
> >
> > > Only one jalr instruction makes the entry atomicity.
> >
> > I'll have to take your word for that.
> >
> > As above, I don't think this sequence is safe regardless.
> >
> > > There are already several proposed solutions:
> > > 1. Make stop_machine guarantee all CPU out of preemption point.
> > > 2. Expand -fpatchable-function-entry from 4 to 24, and make detour
> > > codes atomicity.
> > > 3. We want to propose a solution to make auipc by hardware mask_irq.
> > > For more details, see:
> > > https://www.youtube.com/watch?v=4JkkkXuEvCw
> >
> > Ignoring things which require HW changes, you could consider doing something
> > like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:
> >
> > https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@xxxxxxx/
> The idea of DYNAMIC_FTRACE_WITH_CALL_OPS (Using data load/store +
> indirect jump instead of auipc+jalr) is similar to Andy's solution
> (See youtube link, last page of ppt). But the key problem is you also
> expand the size of the prologue of the function. 64BIT is already
> expensive, and we can't afford more of it. I would change to seek a
> new atomic auipc+jalr ISA extension to solve this problem.
The atomicity here means:
- auipc + jalr won't be interrupted
- auipc + jalr should be aligned by 64bit, then one sd instruction
could update them in atomic.


>
> DYNAMIC_FTRACE_WITH_CALL_OPS would speed up ftrace_(regs)_caller
> (Mostly for kernel debug), but it won't help
> DYNAMIC_FTRACE_WITH_DIRECT_CALLS. So I do not so care about the
> ftrace_(regs)_caller performance gain.
>
> >
> > ... which would replace the address generation with a load, which can be
> > atomic, and would give you a number of other benefits (e.g. avoiding branch
> > range limitations, performance benefits as in the cover letter).
> >
> > Thanks,
> > Mark.
>
>
>
> --
> Best Regards
> Guo Ren



--
Best Regards
Guo Ren