Re: [PATCH 5/6] sched/preempt: add PREEMPT_DYNAMIC using static keys

From: Frederic Weisbecker
Date: Mon Dec 13 2021 - 17:05:09 EST


On Tue, Nov 09, 2021 at 05:24:07PM +0000, Mark Rutland wrote:
> Where an architecture selects HAVE_STATIC_CALL but not
> HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
> which will either branch to a callee or return to the caller.
>
> On such architectures, a number of constraints can conspire to make
> those trampolines more complicated and potentially less useful than we'd
> like. For example:
>
> * Hardware and software control flow integrity schemes can require the
> additition of "landing pad" instructions (e.g. `BTI` for arm64), which
> will also be present at the "real" callee.
>
> * Limited branch ranges can require that trampolines generate or load an
> address into a registter and perform an indirect brach (or at least
> have a slow path that does so). This loses some of the benefits of
> having a direct branch.
>
> * Interaction with SW CFI schemes can be complicated and fragile, e.g.
> requiring that we can recognise idiomatic codegen and remove
> indirections understand, at least until clang proves more helpful
> mechanisms for dealing with this.
>
> For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
> really only need to enable/disable specific preemption functions. We can
> achieve the same effect without a number of the pain points above by
> using static keys to fold early return cases into the preemption
> functions themselves rather than in an out-of-line trampoline,
> effectively inlining the trampoline into the start of the function.
>
> For arm64, this results in good code generation, e.g. the
> dynamic_cond_resched() wrapper looks as follows (with the first `B` being
> replaced with a `NOP` when the function is disabled):
>
> | <dynamic_cond_resched>:
> | bti c
> | b <dynamic_cond_resched+0x10>
> | mov w0, #0x0 // #0
> | ret
> | mrs x0, sp_el0
> | ldr x0, [x0, #8]
> | cbnz x0, <dynamic_cond_resched+0x8>
> | paciasp
> | stp x29, x30, [sp, #-16]!
> | mov x29, sp
> | bl <preempt_schedule_common>
> | mov w0, #0x1 // #1
> | ldp x29, x30, [sp], #16
> | autiasp
> | ret
>
> ... compared to the regular form of the function:
>
> | <__cond_resched>:
> | bti c
> | mrs x0, sp_el0
> | ldr x1, [x0, #8]
> | cbz x1, <__cond_resched+0x18>
> | mov w0, #0x0 // #0
> | ret
> | paciasp
> | stp x29, x30, [sp, #-16]!
> | mov x29, sp
> | bl <preempt_schedule_common>
> | mov w0, #0x1 // #1
> | ldp x29, x30, [sp], #16
> | autiasp
> | ret
>
> Any architecture which implements static keys should be able to use this
> to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
> calls.
>
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>

Anyone has an opinion on that? Can we do better on the arm64 static call side
or should we resign ourself to using that static keys direction?

Also I assume that, sooner or later, arm64 will eventually need a static call
implementation....

Thanks.