Re: [Patch] perf_event: fix a race condition in perf_remove_from_context()

From: Peter Zijlstra
Date: Mon Sep 01 2014 - 04:38:44 EST


On Thu, Aug 28, 2014 at 04:27:35PM -0700, Cong Wang wrote:
> From: Cong Wang <cwang@xxxxxxxxxxxxxxxx>
>
> We saw a kernel soft lockup in perf_remove_from_context(),
> it looks like the `perf` process, when exiting, could not go
> out of the retry loop. Meanwhile, the target process was forking
> a child. So either the target process should execute the smp
> function call to deactive the event (if it was running) or it should
> do a context switch which deactives the event.
>
> It seems we optimize out a context switch in perf_event_context_sched_out(),
> and what's more important, we still test an obsolete task pointer when
> retrying, so no one actually would deactive that event in this situation.
> Fix it directly by reloading the task pointer in perf_remove_from_context().
> This should fix the above soft lockup.



> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f9c1ed0..c4141a0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1524,6 +1524,11 @@ retry:

Please use either:

.gitconfig:

[diff "default"]
xfuncname = "^[[:alpha:]$_].*[^:]$"

.quiltrc:

QUILT_DIFF_OPTS="-F ^[[:alpha:]\$_].*[^:]\$"

> */
> if (ctx->is_active) {
> raw_spin_unlock_irq(&ctx->lock);
> + /*
> + * Reload the task pointer, it might have been changed by
> + * a concurrent perf_event_context_sched_out() without switching
> + */
> + task = ctx->task;
> goto retry;
> }

You forgot to check if that same error happened in other places (it
does), please fix all of them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/