Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions

From: Nadav Amit
Date: Wed Jun 12 2019 - 15:49:09 EST


> On Jun 11, 2019, at 8:55 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jun 11, 2019 at 11:22:54AM -0400, Steven Rostedt wrote:
>> On Tue, 11 Jun 2019 10:03:07 +0200
>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>
>>
>>> So what happens is that arch_prepare_optimized_kprobe() <-
>>> copy_optimized_instructions() copies however much of the instruction
>>> stream is required such that we can overwrite the instruction at @addr
>>> with a 5 byte jump.
>>>
>>> arch_optimize_kprobe() then does the text_poke_bp() that replaces the
>>> instruction @addr with int3, copies the rel jump address and overwrites
>>> the int3 with jmp.
>>>
>>> And I'm thinking the problem is with something like:
>>>
>>> @addr: nop nop nop nop nop
>>
>> What would work would be to:
>>
>> add breakpoint to first opcode.
>>
>> call synchronize_tasks();
>>
>> /* All tasks now hitting breakpoint and jumping over affected
>> code */
>>
>> update the rest of the instructions.
>>
>> replace breakpoint with jmp.
>>
>> One caveat is that the replaced instructions must not be a call
>> function. As if the call function calls schedule then it will
>> circumvent the synchronize_tasks(). It would be OK if that call is the
>> last of the instructions. But I doubt we modify anything more then a
>> call size anyway, so this should still work for all current instances.
>
> Right, something like this could work (although I cannot currently find
> synchronize_tasks), but it would make the optprobe stuff fairly slow
> (iirc this sync_tasks() thing could be pretty horrible).

I have run into similar problems before.

I had two problematic scenarios. In the first case, I had a âcallâ in the
middle of the patched code-block, but this call was always followed by a
âjumpâ to the end of the potentially patched code-block, so I did not have
the problem.

In the second case, I had an indirect call (which is shorter than a direct
call) being patched into a direct call. In this case, I preceded the
indirect call with NOPs so indeed the indirect call was at the end of the
patched block.

In certain cases, if a shorter instruction should be potentially patched
into a longer one, the shorter one can be preceded by some prefixes. If
there are multiple REX prefixes, for instance, the CPU only uses the last
one, IIRC. This can allow to avoid synchronize_sched() when patching a
single instruction into another instruction with a different length.

Not sure how helpful this information is, but sharing - just in case.