Re: [RFC PATCH 1/3] static_call: Add static call infrastructure

From: Ard Biesheuvel
Date: Fri Nov 09 2018 - 12:52:33 EST


On 9 November 2018 at 18:46, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Fri, Nov 09, 2018 at 06:33:03PM +0100, Ard Biesheuvel wrote:
>> On 9 November 2018 at 18:31, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> > On Fri, Nov 09, 2018 at 06:25:24PM +0100, Ard Biesheuvel wrote:
>> >> On 9 November 2018 at 16:14, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
>> >> > On 9 November 2018 at 16:10, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> >> >> On Fri, Nov 09, 2018 at 02:39:17PM +0100, Ard Biesheuvel wrote:
>> >> >>> > + for (site = start; site < stop; site++) {
>> >> >>> > + struct static_call_key *key = static_call_key(site);
>> >> >>> > + unsigned long addr = static_call_addr(site);
>> >> >>> > +
>> >> >>> > + if (list_empty(&key->site_mods)) {
>> >> >>> > + struct static_call_mod *mod;
>> >> >>> > +
>> >> >>> > + mod = kzalloc(sizeof(*mod), GFP_KERNEL);
>> >> >>> > + if (!mod) {
>> >> >>> > + WARN(1, "Failed to allocate memory for static calls");
>> >> >>> > + return;
>> >> >>> > + }
>> >> >>> > +
>> >> >>> > + mod->sites = site;
>> >> >>> > + list_add_tail(&mod->list, &key->site_mods);
>> >> >>> > +
>> >> >>> > + /*
>> >> >>> > + * The trampoline should no longer be used. Poison it
>> >> >>> > + * it with a BUG() to catch any stray callers.
>> >> >>> > + */
>> >> >>> > + arch_static_call_poison_tramp(addr);
>> >> >>>
>> >> >>> This patches the wrong thing: the trampoline is at key->func not addr.
>> >> >>
>> >> >> If you look at the x86 implementation, it actually does poison the
>> >> >> trampoline.
>> >> >>
>> >> >> The address of the trampoline isn't actually known here. key->func
>> >> >> isn't the trampoline address; it's the destination func address.
>> >> >>
>> >> >> So instead I passed the address of the call instruction. The arch code
>> >> >> then reads the instruction to find the callee (the trampoline).
>> >> >>
>> >> >> The code is a bit confusing. To make it more obvious, maybe we should
>> >> >> add another arch function to read the call destination. Then this code
>> >> >> can pass that into arch_static_call_poison_tramp().
>> >> >>
>> >> >
>> >> > Ah right, so I am basically missing a dereference in my
>> >> > arch_static_call_poison_tramp() code if this breaks.
>> >> >
>> >>
>> >> Could we call it 'defuse' rather than 'poision'? On arm64, we will
>> >> need to keep it around to bounce function calls that are out of range,
>> >> and replace it with a PLT sequence.
>> >
>> > Ok, but doesn't that defeat the purpose of the inline approach?
>> >
>>
>> It does. But this only occurs when a module is loaded far away, and
>> this will only happen if you have 2 GB range KASLR enabled, or your
>> 128 MB module region gets exhausted for some reason, so the majority
>> of calls should use a single relative branch.
>
> Makes sense. Do you also account for the possibility that the original
> call emitted by GCC was far away and thus used the PLT?
>

That doesn't really happen. vmlinux is never over 128 MB, and the
modules are only partially linked, so the linker never gets to the
stage where it needs to emit veneers.

It's a bit fiddly since inline and out-of-line both use
arch_static_call_transform(), but what I need to do is basically:

- for out-of-line, the trampoline needs to be patched into a
movn/movk/movk/br sequence if the target is too far
- for inline, the trampoline itself needs to be patched from
adrp/ldr/br (which does a load and a indirect branch) to
movn/movk/movk/br (which uses immediates), and the call sites need to
be patched into calls to the veneer if the target is out of range.

So arch_static_call_transform() needs to know where the trampoline is
for this use case, so could we perhaps add a 'void *orig' in the key
struct that keeps track of the original value of 'addr' ?