Re: [PATCH -next V6 1/7] riscv: ftrace: Fixup panic by disabling preemption

From: Mark Rutland
Date: Mon Jan 30 2023 - 06:18:10 EST


On Sat, Jan 28, 2023 at 06:00:20PM +0800, Guo Ren wrote:
> On Thu, Jan 12, 2023 at 8:05 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:

> > Ignoring things which require HW changes, you could consider doing something
> > like what I'm doing for arm64 with DYNAMIC_FTRACE_WITH_CALL_OPS:
> >
> > https://lore.kernel.org/lkml/20230109135828.879136-1-mark.rutland@xxxxxxx/
> The idea of DYNAMIC_FTRACE_WITH_CALL_OPS (Using data load/store +
> indirect jump instead of auipc+jalr) is similar to Andy's solution
> (See youtube link, last page of ppt).

Sure; I was present in that room and I spoke with Andy at the time.

The solutions are similar, but the important detail with
DYNAMIC_FTRACE_WITH_CALL_OPS is that the load and indirect branch is moved into
a common trampoline so that each call-site can be smaller. The ops pointer is
placed *before* the function entry point and doesn't need to be skipped with a
direct branch (which Andy's approach could also do if he aligned functions
similarly).

> But the key problem is you also expand the size of the prologue of the
> function. 64BIT is already expensive, and we can't afford more of it. I would
> change to seek a new atomic auipc+jalr ISA extension to solve this problem.

Sure, and that's nice for *new* hardware, but I'm talking about a solution
which works on *current* hardware.

> DYNAMIC_FTRACE_WITH_CALL_OPS would speed up ftrace_(regs)_caller (Mostly for
> kernel debug), but it won't help DYNAMIC_FTRACE_WITH_DIRECT_CALLS. So I do
> not so care about the ftrace_(regs)_caller performance gain.

Actually, the plan is that it *will* help DYNAMIC_FTRACE_WITH_DIRECT_CALLS; we
just didn't make all the necessary changes in one go.

Florent Revest is looking at implementing that by placing the direct call
pointer into the ops, so the common trampoline can load that directly.

He has an older draft available at:

https://github.com/FlorentRevest/linux/commits/indirect-direct-calls-3

... and since then, having spoken to Steven, we came up with a plan to make all
direct calls require an ops (which is the case for DIRECT_CALLS_MULTI), and
place a trampoline pointer in the ops.

That way, the common trampoline can do something like (in arm64 asm):

| LDR <tmp>, [<ops>, #OPS_TRAMP_PTR]
| CBNZ <tmp>, __call_tramp_directly
|
| // ... regular regs trampoline logic here
|
| __call_tramp_directly:
|
| // ... shuffle registers here
|
| BR <tmp>

... and I believe the same should work for riscv.

Thanks,
Mark.