Re: [PATCH v3 0/6] Static calls

From: Nadav Amit
Date: Fri Jan 11 2019 - 10:49:06 EST


> On Jan 11, 2019, at 7:15 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> On Fri, Jan 11, 2019 at 01:47:01AM +0000, Nadav Amit wrote:
>> Here is an alternative idea (although similar to Stevenâs and my code).
>>
>> Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow
>> should be done by the calling convention (and gcc plugin should allow us to
>> enforce). Also assume that we hold a table with all source RIP and the
>> matching target.
>>
>> Now, in the int3 handler can you take the faulting RIP and search for it in
>> the âstatic-callsâ table, writing the RIP+5 (offset) into R10 (return
>> address) and the target into R11. You make the int3 handler to divert the
>> code execution by changing pt_regs->rip to point to a new function that does:
>>
>> push R10
>> jmp __x86_indirect_thunk_r11
>>
>> And then you are done. No?
>
> IIUC, that sounds pretty much like what Steven proposed:
>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20181129122000.7fb4fb04%40gandalf.local.home&amp;data=02%7C01%7Cnamit%40vmware.com%7Ce3f0b96a1e83417af48808d677d7a147%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636828165370908292&amp;sdata=PFzrJQzoa21IRYmEuqHSSGYrNZt0zIo8TGOZa3NWbOE%3D&amp;reserved=0

Stupid me. Iâve remembered it slightly different (the caller saving the
target in a register).

> I liked the idea, BUT, how would it work for callee-saved PV ops? In
> that case there's only one clobbered register to work with (rax).

Thatâs would be more tricky. How about using a per-CPU trampoline code to
hold a direct call to the target and temporarily disable preemption (which
might be simpler by disabling IRQs):

Static-call modifier:

1. synchronize_sched() to ensure per-cpu trampoline is not used
2. Patches the jmp in a per-cpu trampoline (see below)
3. Saves the call source RIP in [per-cpu scratchpad RIP] (below)
4. Configures the int3 handler to use static-call int3 handler
5. Patches the call target (as it currently does).

Static-call int3 handler:
1. Changes flags on the stack to keep IRQs disabled on return
2. Jumps to per-cpu trampoline on return

Per-cpu trampoline:
push [per-CPU scratchpad RIP]
sti
jmp [ target ] (this one is patched)

Note that no IRQ should be possible between the STI and the JMP due to STI
blocking.

What do you say?