[PATCH 0/4] Optimize cgroup context switch

From: kan . liang
Date: Mon Apr 29 2019 - 10:45:21 EST


From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

On systems with very high context switch rates between cgroups,
there are high overhead using cgroup perf.

Current codes have two issues.
- System-wide events are mistakenly switched in cgroup
context switch. It causes system-wide events miscounting,
and brings avoidable overhead.
Patch 1 fixes the issue.
- The cgroup context switch sched_in is low efficient.
All cgroup events share the same per-cpu pinned/flexible groups.
The RB trees for pinned/flexible groups don't understand cgroup.
Current code has to traverse all events, and use event_filter_match()
to filter the events for specific cgroup.
Patch 2-4 adds a fast path for cgroup context switch sched_in by
training the RB tree to understand cgroup. The extra filtering
can be avoided.


Here is test with 6 cgroups running.
Each cgroup has a specjbb benchmark running.
The perf command is as below.
perf stat -e cycles,instructions -e cycles,instructions
-e cycles,instructions -e cycles,instructions
-e cycles,instructions -e cycles,instructions
-G cgroup1,cgroup1,cgroup2,cgroup2,cgroup3,cgroup3
-G cgroup4,cgroup4,cgroup5,cgroup5,cgroup6,cgroup6
-a -e cycles,instructions -I 1000

The average RT (Response Time) reported from specjbb is
used as key performance metrics. (The lower the better)

RT(us) Overhead
Baseline (no perf stat): 4286.9
Use cgroup perf, no patches: 4483.6 4.6%
Use cgroup perf, apply patch 1: 4369.2 1.9%
Use cgroup perf, apple all patches: 4335.3 1.1%

Kan Liang (4):
perf: Fix system-wide events miscounting during cgroup monitoring
perf: Add filter_match() as a parameter for pinned/flexible_sched_in()
perf cgroup: Add cgroup ID as a key of RB tree
perf cgroup: Add fast path for cgroup switch

include/linux/perf_event.h | 7 ++
kernel/events/core.c | 171 +++++++++++++++++++++++++++++++++++++++------
2 files changed, 157 insertions(+), 21 deletions(-)

--
2.7.4