Re: [tip:perf/urgent] perf: Fix find_get_context() vsperf_event_exit_task() race

From: Ingo Molnar
Date: Fri Jan 21 2011 - 10:30:17 EST



* tip-bot for Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> Commit-ID: dbe08d82ce3967ccdf459f7951d02589cf967300
> Gitweb: http://git.kernel.org/tip/dbe08d82ce3967ccdf459f7951d02589cf967300
> Author: Oleg Nesterov <oleg@xxxxxxxxxx>
> AuthorDate: Wed, 19 Jan 2011 19:22:07 +0100
> Committer: Ingo Molnar <mingo@xxxxxxx>
> CommitDate: Wed, 19 Jan 2011 20:04:27 +0100
>
> perf: Fix find_get_context() vs perf_event_exit_task() race
>
> find_get_context() must not install the new perf_event_context
> if the task has already passed perf_event_exit_task().
>
> If nothing else, this means the memory leak. Initially
> ctx->refcount == 2, it is supposed that
> perf_event_exit_task_context() should participate and do the
> necessary put_ctx().
>
> find_lively_task_by_vpid() checks PF_EXITING but this buys
> nothing, by the time we call find_get_context() this task can be
> already dead. To the point, cmpxchg() can succeed when the task
> has already done the last schedule().
>
> Change find_get_context() to populate task->perf_event_ctxp[]
> under task->perf_event_mutex, this way we can trust PF_EXITING
> because perf_event_exit_task() takes the same mutex.
>
> Also, change perf_event_exit_task_context() to use
> rcu_dereference(). Probably this is not strictly needed, but
> with or without this change find_get_context() can race with
> setup_new_exec()->perf_event_exit_task(), rcu_dereference()
> looks better.
>
> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Paul Mackerras <paulus@xxxxxxxxx>
> Cc: Prasad <prasad@xxxxxxxxxxxxxxxxxx>
> Cc: Roland McGrath <roland@xxxxxxxxxx>
> LKML-Reference: <20110119182207.GB12183@xxxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> kernel/perf_event.c | 34 ++++++++++++++++++++--------------
> 1 files changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 84522c7..4ec55ef 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -2201,13 +2201,6 @@ find_lively_task_by_vpid(pid_t vpid)
> if (!task)
> return ERR_PTR(-ESRCH);
>
> - /*
> - * Can't attach events to a dying task.
> - */
> - err = -ESRCH;
> - if (task->flags & PF_EXITING)
> - goto errout;
> -
> /* Reuse ptrace permission checks for now. */
> err = -EACCES;
> if (!ptrace_may_access(task, PTRACE_MODE_READ))
> @@ -2268,14 +2261,27 @@ retry:
>
> get_ctx(ctx);
>
> - if (cmpxchg(&task->perf_event_ctxp[ctxn], NULL, ctx)) {
> - /*
> - * We raced with some other task; use
> - * the context they set.
> - */
> + err = 0;
> + mutex_lock(&task->perf_event_mutex);
> + /*
> + * If it has already passed perf_event_exit_task().
> + * we must see PF_EXITING, it takes this mutex too.
> + */
> + if (task->flags & PF_EXITING)
> + err = -ESRCH;
> + else if (task->perf_event_ctxp[ctxn])
> + err = -EAGAIN;
> + else
> + rcu_assign_pointer(task->perf_event_ctxp[ctxn], ctx);
> + mutex_unlock(&task->perf_event_mutex);
> +
> + if (unlikely(err)) {
> put_task_struct(task);
> kfree(ctx);
> - goto retry;
> +
> + if (err == -EAGAIN)
> + goto retry;
> + goto errout;
> }
> }
>
> @@ -6127,7 +6133,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
> * scheduled, so we are now safe from rescheduling changing
> * our context.
> */
> - child_ctx = child->perf_event_ctxp[ctxn];
> + child_ctx = rcu_dereference(child->perf_event_ctxp[ctxn]);
> task_ctx_sched_out(child_ctx, EVENT_ALL);
>
> /*

hm, this one's causing:

[ 25.557579] ===================================================
[ 25.561361] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 25.561361] ---------------------------------------------------
[ 25.561361] kernel/perf_event.c:6136 invoked rcu_dereference_check() without protection!
[ 25.561361]
[ 25.561361] other info that might help us debug this:
[ 25.561361]
[ 25.561361]
[ 25.561361] rcu_scheduler_active = 1, debug_locks = 0
[ 25.561361] no locks held by true/1397.
[ 25.561361]
[ 25.561361] stack backtrace:
[ 25.561361] Pid: 1397, comm: true Not tainted 2.6.38-rc1-tip+ #86752
[ 25.561361] Call Trace:
[ 25.561361] [<ffffffff8106cd98>] ? lockdep_rcu_dereference+0xaa/0xb3
[ 25.561361] [<ffffffff810b34ee>] ? perf_event_exit_task+0x118/0x22a
[ 25.561361] [<ffffffff811133b8>] ? free_fs_struct+0x44/0x48
[ 25.561361] [<ffffffff810434ef>] ? do_exit+0x2c8/0x770
[ 25.561361] [<ffffffff813a52ed>] ? retint_swapgs+0xe/0x13
[ 25.561361] [<ffffffff81043c3c>] ? do_group_exit+0x82/0xad
[ 25.561361] [<ffffffff81043c7e>] ? sys_exit_group+0x17/0x1b
[ 25.561361] [<ffffffff81002acb>] ? system_call_fastpath+0x16/0x1b

Any ideas?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/