Re: [RFC PATCH 1/1] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation

From: Steven Rostedt
Date: Wed Apr 24 2019 - 08:35:31 EST


On Wed, 24 Apr 2019 08:20:12 +0200
Nicolai Stange <nstange@xxxxxxx> wrote:

> Alright, if there's a use case beyond live patching, I can try to handle
> 32 bits alongside, of course.
>
> However, some care will be needed when determining the actual context
> from ftrace_int3_handler(): tracing anything before the addition or
> after the subtraction of HARDIRQ_OFFSET to/from preempt_count in
> irq_enter() resp. irq_exit() could otherwise clobber the "normal" state
> in thread_info, correct?
>
> How about having a fixed size stack with three entries shared between
> "normal", "irq" and "softirq" in thread_info, as well as a dedicated
> slot for nmi context? (The stack would be modified/consumed only with
> irqs disabled).
>
> Which makes me wonder:
> - if we disabled preemption from ftrace_int3_handler() and reenabled it
> again from the *_int3_tramp*s, these stacks could be made per-CPU,
> AFAICS,

Heh, here you go again, having some of the ideas I came up with as well.

I thought about this use per-cpu variables and this preempt disable
hack as well. But thinking more on it, it goes into the "too crazy, and
too fragile" category. It breaks a lot of norms (but I do that a lot
anyway ;) and decided that it would probably not be accepted by others.

I much rather use the above thread info solution. As for the
irq_enter/exit() issue, instead of using the preempt_count context, we
could just have a counter in the thread info as well:

depth = local_add_return(thread_info->ftrace_int_depth);
thread_info->slot[depth] = *(pt_regs->sp);

Where thread_info->slot is an array of 4 long words.

And then in the trampoline: (pseudo code)

push %rax // just a place holder
push %rax
push %rdx
mov thread->depth, %rax
mov thread->slot(%rax), %rdx
mov %rdx, 16(%rsp)
add -1,%rax
mov %rax, thread->depth
pop %rdx
pop %rax
jmp ftrace_caller

If an interrupt were to come in at any time, it would just increment
the depth, use the next slot, and then decrement it back to what the
depth was when it interrupted the code.

-- Steve

> - and why not use this strategy on x86_64, too? The advantages would be
> a common implemention between 32 and 64 bit and more importantly, not
> relying on that %r11 hack. When doing the initial RFC patch, it
> wasn't clear to me that at most one stack slot per context would be
> needed, i.e. only four in total...