Re: [PATCH] perf/core: fix group {cpu,task} validation

From: Mark Rutland
Date: Fri Jun 23 2017 - 06:09:04 EST


Hi,

On Fri, Jun 23, 2017 at 08:56:38AM +0800, zhouchengming wrote:
> On 2017/6/22 22:41, Mark Rutland wrote:
> >Regardless of which events form a group, it does not make sense for the
> >events to target different tasks and/or CPUs, as this leaves the group
> >inconsistent and impossible to schedule. The core perf code assumes that
> >these are consistent across (successfully intialised) groups.
> >
> >Core perf code only verifies this when moving SW events into a HW
> >context. Thus, we can violate this requirement for pure SW groups and
> >pure HW groups, unless the relevant PMU driver happens to perform this
> >verification itself. These mismatched groups subsequently wreak havoc
> >elsewhere.
> >
> >For example, we handle watchpoints as SW events, and reserve watchpoint
> >HW on a per-cpu basis at pmu::event_init() time to ensure that any event
> >that is initialised is guaranteed to have a slot at pmu::add() time.
> >However, the core code only checks the group leader's cpu filter (via
> >event_filter_match()), and can thus install follower events onto CPUs
> >violating thier (mismatched) CPU filters, potentially installing them
> >into a CPU without sufficient reserved slots.

[...]

> >Fix this by validating this requirement regardless of whether we're
> >moving events.
> >
> >Signed-off-by: Mark Rutland<mark.rutland@xxxxxxx>
> >Cc: Alexander Shishkin<alexander.shishkin@xxxxxxxxxxxxxxx>
> >Cc: Arnaldo Carvalho de Melo<acme@xxxxxxxxxx>
> >Cc: Ingo Molnar<mingo@xxxxxxxxxx>
> >Cc: Peter Zijlstra<peterz@xxxxxxxxxxxxx>
> >Cc: Zhou Chengming<zhouchengming1@xxxxxxxxxx>
> >Cc: linux-kernel@xxxxxxxxxxxxxxx
> >---
> > kernel/events/core.c | 39 +++++++++++++++++++--------------------
> > 1 file changed, 19 insertions(+), 20 deletions(-)
> >
> >diff --git a/kernel/events/core.c b/kernel/events/core.c
> >index 6c4e523..1dca484 100644
> >--- a/kernel/events/core.c
> >+++ b/kernel/events/core.c
> >@@ -10010,28 +10010,27 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id)
> > goto err_context;
> >
> > /*
> >- * Do not allow to attach to a group in a different
> >- * task or CPU context:
> >+ * Make sure we're both events for the same CPU;
> >+ * grouping events for different CPUs is broken; since
> >+ * you can never concurrently schedule them anyhow.
> > */
> >- if (move_group) {
> >- /*
> >- * Make sure we're both on the same task, or both
> >- * per-cpu events.
> >- */
> >- if (group_leader->ctx->task != ctx->task)
> >- goto err_context;
> >+ if (group_leader->cpu != event->cpu)
> >+ goto err_context;
> >
> >- /*
> >- * Make sure we're both events for the same CPU;
> >- * grouping events for different CPUs is broken; since
> >- * you can never concurrently schedule them anyhow.
> >- */
> >- if (group_leader->cpu != event->cpu)
> >- goto err_context;
> >- } else {
> >- if (group_leader->ctx != ctx)
> >- goto err_context;
> >- }
> >+ /*
> >+ * Make sure we're both on the same task, or both
> >+ * per-cpu events.
> >+ */
> >+ if (group_leader->ctx->task != ctx->task)
> >+ goto err_context;
> >+
> >+ /*
> >+ * Do not allow to attach to a group in a different task
> >+ * or CPU context. If we're moving SW events, we'll fix
> >+ * this up later, so allow that.
> >+ */
> >+ if (!move_group&& group_leader->ctx != ctx)
> >+ goto err_context;
>
> We don't need to check move_group here, the previous two checks
> already make sure the events are on the same task and the same cpu.

That's not sufficient to ensure that they're the same context, however.

> So when move_group needed, they will be moved to the same taskctx or
> cpuctx then.

Consider the case of two "uncore" PMUs, X and Y. Each has their own
cpuctx. You could open PMU X event with cpu == 0 && !task, and you could
subsequently open a PMU Y event following X with cpu == 0, && !task.

Neither event is a SW event, so we won't set move_group, and thus we
won't move either event.

Each event would be placed in its respective PMU's cpuctx, so
group_leader->ctx != event->ctx. We don't check this again prior to
installing the event, which would go wrong:

perf_install_in_context(ctx, event, event->cpu)
-> __perf_install_in_context()
-> add_event_to_ctx(event, ctx)
-> perf_group_attach(event)
-> WARN_ON_ONCE(group_leader->ctx != event->ctx)

... and subsequently a number of other things could go wrong due to this
mismatch.

We need to keep this check in the !move_group case.

Thanks,
Mark.