Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce PERF_RECORD_BPF_EVENT

From: Peter Zijlstra
Date: Tue Jan 08 2019 - 14:43:32 EST


On Tue, Jan 08, 2019 at 07:10:20PM +0000, Song Liu wrote:
> > On Jan 8, 2019, at 10:41 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Thu, Dec 20, 2018 at 10:29:00AM -0800, Song Liu wrote:
> >> @@ -986,9 +987,35 @@ enum perf_event_type {
> >> */
> >> PERF_RECORD_KSYMBOL = 17,
> >>
> >> + /*
> >> + * Record bpf events:
> >> + * enum perf_bpf_event_type {
> >> + * PERF_BPF_EVENT_UNKNOWN = 0,
> >> + * PERF_BPF_EVENT_PROG_LOAD = 1,
> >> + * PERF_BPF_EVENT_PROG_UNLOAD = 2,
> >> + * };
> >> + *
> >> + * struct {
> >> + * struct perf_event_header header;
> >> + * u16 type;
> >> + * u16 flags;
> >> + * u32 id;
> >> + * u8 tag[BPF_TAG_SIZE];
> >> + * struct sample_id sample_id;
> >> + * };
> >> + */
> >> + PERF_RECORD_BPF_EVENT = 18,
> >> +
> >
> > Elsewhere today, I raised the point that by the time (however short
> > interval) userspace gets around to reading this event, the actual
> > program could be gone again.
> >
> > In this case the program has been with us for a very short period
> > indeed; but it could still have generated some samples or otherwise
> > generated trace data.
>
> Since we already have the separate KSYMBOL events, BPF_EVENT is only
> required for advanced use cases, like annotation. So I guess missing
> it for very-short-living programs should not be a huge problem?
>
> > It was suggested to allow pinning modules/programs to avoid this
> > situation, but that of course has other undesirable effects, such as a
> > trivial DoS.
> >
> > A truly horrible hack would be to include an open filedesc in the event
> > that needs closing to release the resource, but I'm sorry for even
> > suggesting that **shudder**.
> >
> > Do we have any sane ideas?
>
> How about we gate the open filedesc solution with an option, and limit
> that option for root only? If this still sounds hacky, maybe we should
> just ignore when short-living programs are missed?

I'm afraid we might also 'need' this for the kallsym thing.

The problem is that things like Intel PT (ARM Coresight too IIRC) encode
a bitstream of branch-taken decisions. The only way to decode that and
reconstruct the actual code-flow is with an exact matching text image.

In order to have this matching text we need to be able to copy out every
piece of dynamic text (from kcore) that has ever executed before it
dissapears.

Elsewhere (*), Andi suggests to have a kind of text-free fence
interface, where userspace can call a complete. And I suppose as long we
know there is a consumer, we also know we'll not be blocked
indefinitely. So it would have to be slightly more complicated than
suggested, but I think that is something we could work with.

It would also not complicate these events.



[*] https://lkml.kernel.org/r/20190108172721.GN6118@xxxxxxxxxxxxxxxxxxxx