Re: [PATCH 13/15] x86/static_call: Add inline static call implementation for x86-64

From: Nadav Amit
Date: Mon Jun 10 2019 - 15:25:05 EST


> On Jun 10, 2019, at 11:55 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> On Mon, Jun 10, 2019 at 06:45:52PM +0000, Nadav Amit wrote:
>>> On Jun 10, 2019, at 11:33 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>
>>> On Wed, Jun 05, 2019 at 03:08:06PM +0200, Peter Zijlstra wrote:
>>>> --- a/arch/x86/include/asm/static_call.h
>>>> +++ b/arch/x86/include/asm/static_call.h
>>>> @@ -2,6 +2,20 @@
>>>> #ifndef _ASM_STATIC_CALL_H
>>>> #define _ASM_STATIC_CALL_H
>>>>
>>>> +#include <asm/asm-offsets.h>
>>>> +
>>>> +#ifdef CONFIG_HAVE_STATIC_CALL_INLINE
>>>> +
>>>> +/*
>>>> + * This trampoline is only used during boot / module init, so it's safe to use
>>>> + * the indirect branch without a retpoline.
>>>> + */
>>>> +#define __ARCH_STATIC_CALL_TRAMP_JMP(key, func) \
>>>> + ANNOTATE_RETPOLINE_SAFE \
>>>> + "jmpq *" __stringify(key) "+" __stringify(SC_KEY_func) "(%rip) \n"
>>>> +
>>>> +#else /* !CONFIG_HAVE_STATIC_CALL_INLINE */
>>>
>>> I wonder if we can simplify this (and drop the indirect branch) by
>>> getting rid of the above cruft, and instead just use the out-of-line
>>> trampoline as the default for inline as well.
>>>
>>> Then the inline case could fall back to the out-of-line implementation
>>> (by patching the trampoline's jmp dest) before static_call_initialized
>>> is set.
>>
>> I must be missing some context - but what guarantees that this indirect
>> branch would be exactly 5 bytes long? Isnât there an assumption that this
>> would be the case? Shouldnât there be some handling of the padding?
>
> We don't patch the indirect branch. It's just part of a temporary
> trampoline which is called by the call site, and which does "jmp
> key->func" during boot until static call initialization is done.
>
> (Though I'm suggesting removing that.)

Oh... I see.

On another note - even if this branch is only executed during module
initialization, it does seem safer to use a retpoline instead of an indirect
branch (consider a branch that is run many times on one hardware thread on
SMT, when STIBP is not set, and attacker code is run on the second thread).

I guess you donât simply call the retpoline code so since you donât have a
clobbered register to hold the target. But it still seems possible to use
a retpoline. Anyhow, it might be a moot discussion if this code is removed.