Re: [RFC 2/2] perf: Sharing PMU counters across compatible events

From: Song Liu
Date: Mon May 28 2018 - 14:24:54 EST




> On May 28, 2018, at 4:15 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, May 04, 2018 at 04:11:02PM -0700, Song Liu wrote:
>> Connection among perf_event and perf_event_dup are built with function
>> rebuild_event_dup_list(cpuctx). This function is only called when events
>> are added/removed or when a task is scheduled in/out. So it is not on
>> critical path of perf_rotate_context().
>
> Why is perf_rotate_context() the only critical path? I would say the
> context switch path is rather critical too.
>
>> @@ -2919,8 +3014,10 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>>
>> if (ctx->task) {
>> WARN_ON_ONCE(cpuctx->task_ctx != ctx);
>> - if (!ctx->is_active)
>> + if (!ctx->is_active) {
>> cpuctx->task_ctx = NULL;
>> + rebuild_event_dup_list(cpuctx);
>> + }
>> }
>>
>> /*
>
>> +static void rebuild_event_dup_list(struct perf_cpu_context *cpuctx)
>> +{
>> + int dup_count = cpuctx->ctx.nr_events;
>> + struct perf_event_context *ctx = cpuctx->task_ctx;
>> + struct sched_in_data sid = {
>> + .ctx = ctx,
>> + .cpuctx = cpuctx,
>> + .can_add_hw = 1,
>> + };
>> +
>> + if (ctx)
>> + dup_count += ctx->nr_events;
>> +
>> + kfree(cpuctx->dup_event_list);
>> + cpuctx->dup_event_count = 0;
>> +
>> + cpuctx->dup_event_list =
>> + kzalloc(sizeof(struct perf_event_dup) * dup_count, GFP_ATOMIC);
>
>
> __schedule()
> local_irq_disable()
> raw_spin_lock(rq->lock)
> context_switch()
> prepare_task_switch()
> perf_event_task_sched_out()
> __perf_event_task_sched_out()
> perf_event_context_sched_out()
> task_ctx_sched_out()
> ctx_sched_out()
> rebuild_event_dup_list()
> kzalloc()
> ...
> spin_lock()
>
> Also, as per the above, this nests a regular spin lock inside the
> (raw) rq->lock, which is a no-no.
>
> Not to mention that whole O(n) crud in the scheduling path...

I think we can also fix the scheduling path. To achieve this, we need
to limit the sharing within the ctx. In other words, events in
cpuctx->ctx can only share PMU with events in cpuctx->ctx, but not
with events in cpuctx->task_ctx. This will probably also solve the
locking issue here. Let me try it.

Thanks,
Song