Re: [PATCH v3 10/10] perf/cgroup: Do not switch system-wide events in cgroup switch

From: Liang, Kan
Date: Thu Nov 14 2019 - 15:50:04 EST




On 11/14/2019 10:24 AM, Liang, Kan wrote:


On 11/14/2019 10:16 AM, Liang, Kan wrote:


On 11/14/2019 8:57 AM, Peter Zijlstra wrote:
On Thu, Nov 14, 2019 at 08:46:51AM -0500, Liang, Kan wrote:


On 11/14/2019 5:43 AM, Peter Zijlstra wrote:
On Wed, Nov 13, 2019 at 04:30:42PM -0800, Ian Rogers wrote:
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

When counting system-wide events and cgroup events simultaneously, the
system-wide events are always scheduled out then back in during cgroup
switches, bringing extra overhead and possibly missing events. Switching
out system wide flexible events may be necessary if the scheduled in
task's cgroups have pinned events that need to be scheduled in at a higher
priority than the system wide flexible events.

I'm thinking this patch is actively broken. groups->index 'group' wide
and therefore across cpu/cgroup boundaries.

There is no !cgroup to cgroup hierarchy as this patch seems to assume,
specifically look at how the merge sort in visit_groups_merge() allows
cgroup events to be picked before !cgroup events.


No, the patch intends to avoid switch !cgroup during cgroup context switch.

Which is wrong.

Why we want to switch !cgroup system-wide event in context switch?

How should current perf handle this case?


It seems hard to find a simple case to explain why we should not switch !cgroup during cgroup context switch.

Let me try to explain it using ftrace.

Case 1:
User A do system-wide monitoring for 1 second. No other users.
#perf stat -e branches -a -- sleep 1

The counter counts between 765531.617703 and 765532.620184.
Everything is collected.

<...>-59160 [027] d.h. 765531.617697: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<...>-59160 [027] d.h. 765531.617701: write_msr: MSR_IA32_PMC0(4c1), value 800000000001
<...>-59160 [027] d.h. 765531.617702: write_msr: MSR_P6_EVNTSEL0(186), value 5300c4
<...>-59160 [027] d.h. 765531.617703: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
<idle>-0 [027] d.h. 765532.620184: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<idle>-0 [027] d.h. 765532.620185: write_msr: MSR_P6_EVNTSEL0(186), value 1300c4
<idle>-0 [027] d.h. 765532.620186: rdpmc: 0, value 80000b3e87a4
<idle>-0 [027] d.h. 765532.620187: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f


Case 2:
User A do system-wide monitoring for 1 second.
#perf stat -e branches -a -- sleep 1
At the meantime, User B do cgroup monitoring.
#perf stat -e cycles -G cgroup

The User A expects to collect everything from 765580.196521 to 765581.198150. But it doesn't.

Because of cgroup context switch, the system-wide event for user A stops counting at [765580.213882, 765580.213884],
[765580.213913, 765580.213915], ..., [765580.774304, 765580.774307].

I think it breaks the usage of User A.

Furthermore, switching !cgroup system-wide event also brings extra overhead, which is unnecessary.

<...>-121292 [027] d.h. 765580.196514: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<...>-121292 [027] d.h. 765580.196519: write_msr: MSR_IA32_PMC0(4c1), value 800000000001
<...>-121292 [027] d.h. 765580.196520: write_msr: MSR_P6_EVNTSEL0(186), value 5300c4
<...>-121292 [027] d.h. 765580.196521: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
<idle>-0 [027] d... 765580.213878: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<idle>-0 [027] d... 765580.213880: write_msr: MSR_P6_EVNTSEL0(186), value 1300c4
<idle>-0 [027] d... 765580.213880: rdpmc: 0, value 800000357bc1
<idle>-0 [027] d... 765580.213882: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
simics-poll-25601 [027] d... 765580.213884: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
simics-poll-25601 [027] d... 765580.213888: write_msr: MSR_CORE_PERF_FIXED_CTR1(30a), value 800015820cbe
simics-poll-25601 [027] d... 765580.213889: read_msr: MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value 0
simics-poll-25601 [027] d... 765580.213890: write_msr: MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value b0
simics-poll-25601 [027] d... 765580.213890: write_msr: MSR_IA32_PMC0(4c1), value 800000357bc1
simics-poll-25601 [027] d... 765580.213891: write_msr: MSR_P6_EVNTSEL0(186), value 5300c4
simics-poll-25601 [027] d... 765580.213892: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
simics-poll-25601 [027] d... 765580.213910: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
simics-poll-25601 [027] d... 765580.213911: read_msr: MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value b0
simics-poll-25601 [027] d... 765580.213911: write_msr: MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value 0
simics-poll-25601 [027] d... 765580.213911: rdpmc: 40000001, value 80001582b676
simics-poll-25601 [027] d... 765580.213912: write_msr: MSR_P6_EVNTSEL0(186), value 1300c4
simics-poll-25601 [027] d... 765580.213913: rdpmc: 0, value 800000358491
simics-poll-25601 [027] d... 765580.213913: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
<idle>-0 [027] d... 765580.213915: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<idle>-0 [027] d... 765580.213916: write_msr: MSR_IA32_PMC0(4c1), value 800000358491
<idle>-0 [027] d... 765580.213916: write_msr: MSR_P6_EVNTSEL0(186), value 5300c4
<idle>-0 [027] d... 765580.213917: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f

... ...

simics-poll-25601 [027] d... 765580.774301: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
simics-poll-25601 [027] d... 765580.774302: read_msr: MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value b0
simics-poll-25601 [027] d... 765580.774302: write_msr: MSR_CORE_PERF_FIXED_CTR_CTRL(38d), value 0
simics-poll-25601 [027] d... 765580.774302: rdpmc: 40000001, value 8000165e927b
simics-poll-25601 [027] d... 765580.774303: write_msr: MSR_P6_EVNTSEL0(186), value 1300c4
simics-poll-25601 [027] d... 765580.774303: rdpmc: 0, value 8000059298ce
simics-poll-25601 [027] d... 765580.774304: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
<...>-135379 [027] d... 765580.774307: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<...>-135379 [027] d... 765580.774308: write_msr: MSR_IA32_PMC0(4c1), value 8000059298ce
<...>-135379 [027] d... 765580.774309: write_msr: MSR_P6_EVNTSEL0(186), value 5300c4
<...>-135379 [027] d... 765580.774309: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f
<...>-147127 [027] d.h. 765581.198150: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0
<...>-147127 [027] d.h. 765581.198153: write_msr: MSR_P6_EVNTSEL0(186), value 1300c4
<...>-147127 [027] d.h. 765581.198153: rdpmc: 0, value 80000a573368
<...>-147127 [027] d.h. 765581.198155: write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f


Thanks,
Kan