Re: [PATCH 1/5] ptrace: Prepare to fix racy accesses on taskbreakpoints

From: Frederic Weisbecker
Date: Wed May 04 2011 - 14:22:14 EST


On Wed, May 04, 2011 at 08:31:06AM +0200, Ingo Molnar wrote:
>
> (Linus and Andrew Cc:-ed as well)
>
> * Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>
> > When a task is traced and is in a stopped state, the tracer
> > may execute a ptrace request to examine the tracee state and
> > get its task struct. Right after, the tracee can be killed
> > and thus its breakpoints released.
> > This can happen concurrently when the tracer is in the middle
> > of reading or modifying these breakpoints, leading to dereferencing
> > a freed pointer.
> >
> > Hence, to prepare the fix, create a generic breakpoint reference
> > holding API. When a reference on the breakpoints of a task is
> > held, the breakpoints won't be released until the last reference
> > is dropped. After that, no more ptrace request on the task's
> > breakpoints can be serviced for the tracer.
> >
> > Reported-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
>
> Ok, this series looks a bit scary - and this ptrace.h change does not have
> Oleg's Acked-by. (the arch bits all have maintaner Acked-by's)
>
> The changes look a bit ugly as well: beyond the ugly ifdeffery, we have
> ptrace.h::ptrace_init_task(), which is only used in
> tracehook.h::tracehook_finish_clone() which is only used in
> kernel/fork.c::copy_process().
>
> That's two levels of obfuscation to do something rather simple - i think we
> should get rid of the tracehook.h redirections, it did not work out in the end
> as a method of capturing events - ftrace TRACE_EVENT() seems better structured
> and more maintainable.
>
> But i guess we could live with this fix for v2.6.39, if neither Oleg nor Linus
> and Andrew are hating this further complication of the ptrace mess enough to
> NAK it. Thoughts?
>
> Plus, i'd really love it if you did some stress-testing of this change of a
> mixed ptrace breakpoints and perf breakpoints workload, on some sufficiently
> SMP box. gdb's hbreak is a very low freq way of testing thus such regressions
> take ages to be reported.

But the perf breakpoints (those created using perf syscall) are not touched
at all by this patch. Only the ptrace ones.

What I can stress test is trying some ptrace breakpoint request and at the
same time SIGKILL the child, which is the only way to reproduce the bug
supposed to be fixed. And run that in a loop for one night. I'll try that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/