Re: [PATCH 3.2 114/126] perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race

From: Ben Hutchings
Date: Mon Feb 20 2017 - 19:47:01 EST


On Wed, 2017-02-15 at 22:41 +0000, Ben Hutchings wrote:
> 3.2.85-rc1 review patch.ÂÂIf anyone has any objections, please let me know.
>
> ------------------
>
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>
> commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream.
>
> Di Shen reported a race between two concurrent sys_perf_event_open()
> calls where both try and move the same pre-existing software group
> into a hardware context.
>
> The problem is exactly that described in commit:
>
> Â f63a8daa5812 ("perf: Fix event->ctx locking")
>
> ... where, while we wait for a ctx->mutex acquisition, the event->ctx
> relation can have changed under us.
>
> That very same commit failed to recognise sys_perf_event_context() as an
> external access vector to the events and thereby didn't apply the
> established locking rules correctly.
>
> So while one sys_perf_event_open() call is stuck waiting on
> mutex_lock_double(), the other (which owns said locks) moves the group
> about. So by the time the former sys_perf_event_open() acquires the
> locks, the context we've acquired is stale (and possibly dead).
>
> Apply the established locking rules as per perf_event_ctx_lock_nested()
> to the mutex_lock_double() for the 'move_group' case. This obviously means
> we need to validate state after we acquire the locks.
[...]
> Â /*
> Â Â* See perf_event_ctx_lock() for comments on the details
> Â Â* of swizzling perf_event::ctx.
> Â Â*/
> - mutex_lock_double(&gctx->mutex, &ctx->mutex);
> -
> Â perf_remove_from_context(group_leader, false);
> Â
> Â /*
> @@ -6709,10 +6757,8 @@ SYSCALL_DEFINE5(perf_event_open,
> Â ++ctx->generation;
> Â perf_unpin_context(ctx);
> Â
> - if (move_group) {
> - mutex_unlock(&gctx->mutex);
> - put_ctx(gctx);
> - }
> + if (move_group)
> + perf_event_ctx_unlock(group_leader, gctx);
> Â mutex_unlock(&ctx->mutex);
> Â
> Â event->owner = current;
[...]

Peter has clarified that the last call to put_ctx(gctx) corresponds to
the reference cleared by perf_remove_from_context(group_leader, false)
above. So although perf_event_ctx_unlock() also calls put_ctx(gctx),
we really do want to drop two references here now and should keep the
direct call.

I made the same error when backporting to 3.16, and will fix that as
well.

Ben.

--
Ben Hutchings
73.46% of all statistics are made up.

Attachment: signature.asc
Description: This is a digitally signed message part