Re: [RFC 0/6] optimize ctx switch with rb-tree

From: David Carrillo-Cisneros
Date: Tue Apr 25 2017 - 14:54:51 EST


>
> If I disable traversing in the per-process case then the overhead disappears.
>
> For the system-wide case the ctx->pinned_groups and ctx->flexible_groups lists are parts of per-cpu perf_cpu_context object and count of iterations is small (#events == 29).


Yes, seems like it would benefit from the rb-tree optimization.

Something that is wrong in my RFC (as Mark notes in the "enjoyment"
section of https://lkml.org/lkml/2017/1/12/254), is that care must be
taken to disable the right pmu when dealing with contexts that have
events from more than one PMU. A way to do it is to have the pmu as
part of the rb-tree key (as Peter initially suggested) and use that to
iterate events in the same pmu together.

There's still the open question of what to do when pmu->add fails.
Currently, it stops scheduling events, but that's not right when
dealing with events in "software context" that are not software events
(I am looking at you CQM) and in hardware contexts with more than one
PMU (ARM big-little). Ideally a change in event scheduler should
address that, but it requires more work. Here is a discussion with
Peter about that (https://lkml.org/lkml/2017/1/25/365).

If you guys want to work on it, I'll be happy to help review.
Otherwise, I'll get to it as soon as I have a chance (1-2 months).

Thanks,
David