Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF

From: Song Liu
Date: Sat Mar 13 2021 - 18:04:37 EST




> On Mar 13, 2021, at 2:06 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> On Fri, Mar 12, 2021 at 04:09:53PM +0000, Song Liu wrote:
>>
>>
>>> On Mar 12, 2021, at 7:45 AM, Song Liu <songliubraving@xxxxxx> wrote:
>>>
>>>
>>>
>>>> On Mar 12, 2021, at 4:12 AM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>>>>
>>>> On Thu, Mar 11, 2021 at 06:02:57PM -0800, Song Liu wrote:
>>>>> perf uses performance monitoring counters (PMCs) to monitor system
>>>>> performance. The PMCs are limited hardware resources. For example,
>>>>> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
>>>>>
>>>>> Modern data center systems use these PMCs in many different ways:
>>>>> system level monitoring, (maybe nested) container level monitoring, per
>>>>> process monitoring, profiling (in sample mode), etc. In some cases,
>>>>> there are more active perf_events than available hardware PMCs. To allow
>>>>> all perf_events to have a chance to run, it is necessary to do expensive
>>>>> time multiplexing of events.
>>>>>
>>>>> On the other hand, many monitoring tools count the common metrics (cycles,
>>>>> instructions). It is a waste to have multiple tools create multiple
>>>>> perf_events of "cycles" and occupy multiple PMCs.
>>>>>
>>>>> bperf tries to reduce such wastes by allowing multiple perf_events of
>>>>> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
>>>>> of having each perf-stat session to read its own perf_events, bperf uses
>>>>> BPF programs to read the perf_events and aggregate readings to BPF maps.
>>>>> Then, the perf-stat session(s) reads the values from these BPF maps.
>>>>>
>>>>> Please refer to the comment before the definition of bperf_ops for the
>>>>> description of bperf architecture.
>>>>>
>>>>> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
>>>>> bperf uses a BPF hashmap to share information about BPF programs and maps
>>>>> used by bperf. This map is pinned to bpffs. The default address is
>>>>> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
>>>>> --attr-map.
>>>>
>>>> nice, I recall the presentation about that and was wondering
>>>> when this will come up ;-)
>>>
>>> The progress is slower than I expected. But I finished some dependencies of
>>> this in the last year:
>>>
>>> 1. BPF_PROG_TEST_RUN for raw_tp event;
>>> 2. perf-stat -b, which introduced skeleton and bpf_counter;
>>> 3. BPF task local storage, I didn't use it in this version, but it could,
>>> help optimize bperf in the future.
>>>
>>>>
>>>>>
>>>>> ---
>>>>> Known limitations:
>>>>> 1. Do not support per cgroup events;
>>>>> 2. Do not support monitoring of BPF program (perf-stat -b);
>>>>> 3. Do not support event groups.
>>>>>
>>>>> The following commands have been tested:
>>>>>
>>>>> perf stat --use-bpf -e cycles -a
>>>>> perf stat --use-bpf -e cycles -C 1,3,4
>>>>> perf stat --use-bpf -e cycles -p 123
>>>>> perf stat --use-bpf -e cycles -t 100,101
>>>>
>>>> I assume the output is same as standard perf?
>>
>> Btw, please give it a try. :)
>>
>> It worked pretty well in my tests. If it doesn't work for some combination
>> of options, please let me know.
>
> heya, can't compile
>
> CLANG /home/jolsa/linux-perf/tools/perf/util/bpf_skel/.tmp/bperf_follower.bpf.o
> util/bpf_skel/bperf_follower.bpf.c:8:10: fatal error: 'bperf_u.h' file not found
> #include "bperf_u.h"
> ^~~~~~~~~~~

Oops, I forgot git-add. :(

The file is very simple:

tools/perf/util/bpf_skel/bperf_u.h:


// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2021 Facebook

#ifndef __BPERF_STAT_U_H
#define __BPERF_STAT_U_H

enum bperf_filter_type {
BPERF_FILTER_GLOBAL = 1,
BPERF_FILTER_CPU,
BPERF_FILTER_PID,
BPERF_FILTER_TGID,
};

#endif /* __BPERF_STAT_U_H */

Thanks,
Song