Re: [PATCH] x86/kprobe: unbalanced preempt counter with CONFIG_PREEMPT=y
From: Masami Hiramatsu
Date:  Fri Mar 02 2018 - 20:21:11 EST
On Fri, 2 Mar 2018 09:36:06 +0100 (CET)
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Fri, 2 Mar 2018, Masami Hiramatsu wrote:
> > On Thu, 1 Mar 2018 22:12:56 +0100 (CET)
> > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > > > For each kprobe hook, preempt_disable() is called twice, but
> > > > preempt_enable() is called once, when CONFIG_PREEMPT=y. No
> > > > matter kprobe uses ftrace or int3. Then the preempt counter
> > > > overflows, the process might be preempted and migrate to another
> > > > cpu. It causes the kprobe int3 being treated as not kprobe's int3,
> > > > kernel dies.
> > > > 
> > > > The backtrace looks like this:
> > > > 
> > > > int3: 0000 [#1] PREEMPT SMP
> > > > Modules linkedd in: ...
> > > > CPU: 1 PID: 2618 COMM: ...
> > > > Hardware: ...
> > > > task: ...
> > > > RIP: 0010[<ffffffff81457cd5>] [<ffffffff81457cd5>] jprobe_return_end+0x0/0xb
> > > > ...
> > > > RIP [<ffffffff81457cd5>] jprobe_return_end+0x0/0xb
> > > > RSP ...
> > > > 
> > > > Signed-off-by: Yang Bo <yangbo@xxxxxxxxxx>
> > > > ---
> > > >  arch/x86/kernel/kprobes/core.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > > > index bd36f3c33cd0..1ea5c9caa8f1 100644
> > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > @@ -611,6 +611,7 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
> > > >  		regs->ip = (unsigned long)p->addr;
> > > >  	else
> > > >  		regs->ip = (unsigned long)p->ainsn.insn;
> > > > +	preempt_enable_no_resched();
> > > 
> > > Good catch, but to be honest the proper fix is to move the preempt_enable()
> > > invocation to the callsite. This kind of asymetric handling is error prone
> > > and leads exactly to that type of bugs. Just adding more preempt_enable()
> > > invocations is proliferating the problem.
> > 
> > Oops, I missed to send my NACK not to ML...
> > 
> > ----
> > > NACK. While single-stepping we have to disable preemption, this
> > > is balanced right after debug-exception is handled (which you
> > > removed next patch).
> > > 
> > > kprobes does not use only int3, but also debug (single step) exception.
> > > So, basically its sequence is;
> > > 1) disable preemption at int3 handler and return
> > > 2) do single-step on returned instruction
> > > 3) enable preemption at debug exception handler
> 
> Well, but that does not make the crash Yang is seeing magically go
> away. There must be a bug in that logic somewhere and that needs to be
> addressed.
As Yang said, this bug had been fixed by 
commit 5bb4fc2d8641 ("Disable preemption in ftrace-based jprobes")
> 
> > > Looking deeper into kprobe_int3_handler() there are more places which do
> > > this completely unomprehensible conditional preempt count handling via
> > > skip_singlestep() and setup_detour_execution(). What an unholy mess.
> > > 
> > > Masami?
> > 
> > As I pointed above, since kprobe has to disable preempt while singlestepping,
> > preempt_disable()/enable() must be separated in int3 handler and debug handler
> > by default. There are some exception paths for skipping singlestep to speed up
> > the process. Maybe I can make it better to handle it in kprobe_int3_handler()
> >
> > E.g. check singlestep setup in the end of kprobe_int3_handler() and decide
> >  to enable preempt again or keep disabled.
> > 
> > Does it match to your thought?
> 
> Yes. Doing it at the callsite and documenting the rules proper is much
> preferred. Right now the code looks like:
> 
> 
> 	preempt_disable();
> 	if (foo) {
> 		if (baz())
> 			return;
> 		....
> 		return;
> 	} else {
> 		more of this....
> 	}
> 	preempt_enable();
> 
> And the preempt_enable() is conditionally inside of baz() and other
> functions, which is completely non-obvious.
> 
> 	preempt_disable();
> 	if (foo) {
> 		if (baz()) {
> 			preempt_enable();
> 			return;
> 		}
> 		....
> 		/* Keep preemption disabled across the probe operation */
> 		return;
> 	} else {
> 		more of this....
> 	}
> 	preempt_enable();
> 
> That's a bit more understandable, but still confusing.
> 
> Preemption needs to be disabled when the probe hits and reenabled when the
> probe operation finishes, right?
> 
> So I'd rather have an indicator for probe in progress and use that to
> control preemption:
> 
> handler()
> {
> 	if (!current->probe_active) {
> 		if (is_probe()) {
> 			do_stuff();
> 			current->probe_active = true;
> 			/* Keep preemption disabled across the probe operation */
> 			preempt_disable();
> 			return 1;
> 		}
> 		return 0;
> 	}
> 
> 	if (probe_done()) {
> 	   	do_other_stuff();
> 		current->probe_active = false;
> 		preempt_enable();
> 		return 1;
> 	}
> 	do_more_stuff();
> 	return 1;
> }
> 
> See how that automatically documents the mode of operation and the
> preemption semantics? No need for all that conditional preempt_enable()
> stuff all over the place. It's in reality probably not that simple, but you
> get the idea.
Ok, my idea will be a bit simpler.
handler()
{
	preempt_disable();
	ret = do_kprobe();
	if (!prepared_for_singlestep())
		preempt_enable();
	return ret;
}
This shows in what case we keep preempt disabled, and give a hint
where it is enabled again :)
Thanks,
-- 
Masami Hiramatsu <mhiramat@xxxxxxxxxx>