Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce PERF_RECORD_BPF_EVENT

From: Arnaldo Carvalho de Melo
Date: Tue Jan 08 2019 - 15:16:31 EST


Em Tue, Jan 08, 2019 at 07:10:20PM +0000, Song Liu escreveu:
> > On Jan 8, 2019, at 10:41 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Thu, Dec 20, 2018 at 10:29:00AM -0800, Song Liu wrote:
> >> @@ -986,9 +987,35 @@ enum perf_event_type {
> >> */
> >> PERF_RECORD_KSYMBOL = 17,
> >>
> >> + /*
> >> + * Record bpf events:
> >> + * enum perf_bpf_event_type {
> >> + * PERF_BPF_EVENT_UNKNOWN = 0,
> >> + * PERF_BPF_EVENT_PROG_LOAD = 1,
> >> + * PERF_BPF_EVENT_PROG_UNLOAD = 2,
> >> + * };
> >> + *
> >> + * struct {
> >> + * struct perf_event_header header;
> >> + * u16 type;
> >> + * u16 flags;
> >> + * u32 id;
> >> + * u8 tag[BPF_TAG_SIZE];
> >> + * struct sample_id sample_id;
> >> + * };
> >> + */
> >> + PERF_RECORD_BPF_EVENT = 18,

> > It was suggested to allow pinning modules/programs to avoid this
> > situation, but that of course has other undesirable effects, such as a
> > trivial DoS.
> >
> > A truly horrible hack would be to include an open filedesc in the event
> > that needs closing to release the resource, but I'm sorry for even
> > suggesting that **shudder**.
> >
> > Do we have any sane ideas?
>
> How about we gate the open filedesc solution with an option, and limit
> that option for root only? If this still sounds hacky, maybe we should
> just ignore when short-living programs are missed?

Short lived short programs could go in the event? Short lived long
events.. One could ask for max number of bytes of binary?

The smallest kernel modules are 16KB, multiple of PAGE_SIZE:

[acme@quaco perf]$ cat /proc/modules | sort -k2 -nr | tail
ebtable_nat 16384 1 - Live 0x0000000000000000
ebtable_filter 16384 1 - Live 0x0000000000000000
crct10dif_pclmul 16384 0 - Live 0x0000000000000000
crc32_pclmul 16384 0 - Live 0x0000000000000000
coretemp 16384 0 - Live 0x0000000000000000
btrtl 16384 1 btusb, Live 0x0000000000000000
btbcm 16384 1 btusb, Live 0x0000000000000000
arc4 16384 2 - Live 0x0000000000000000
acpi_thermal_rel 16384 1 int3400_thermal, Live 0x0000000000000000
ac97_bus 16384 1 snd_soc_core, Live 0x0000000000000000
[acme@quaco perf]$

On a Fedora 29 I have these here, all rather small:

# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-04T14:40:32-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-04T14:40:32-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-04T14:40:32-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-04T14:40:32-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-04T14:40:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-04T14:40:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-04T14:40:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-04T14:40:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
[root@quaco IRPF2018]#


Running 'perf trace' with its BPF augmenter get these two more:

158: tracepoint name sys_enter tag 12504ba9402f952f gpl
loaded_at 2019-01-08T17:12:39-0300 uid 0
xlated 512B jited 374B memlock 4096B map_ids 118,117,116
159: tracepoint name sys_exit tag c1bd85c092d6e4aa gpl
loaded_at 2019-01-08T17:12:39-0300 uid 0
xlated 256B jited 191B memlock 4096B map_ids 118,117
[root@quaco ~]#

A PERF_RECORD_MMAP gets as its payload up to PATH_MAX - sizeof(u64).

So for a class of programs, shoving it together with the
PERF_RECORD_MMAP like event may be enough?

You started the shuddering suggestions... ;-)

- Arnaldo