Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce PERF_RECORD_BPF_EVENT

From: Song Liu
Date: Wed Jan 09 2019 - 06:33:28 EST




> On Jan 9, 2019, at 2:18 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 08, 2019 at 11:54:04PM +0000, Song Liu wrote:
>
>> I think Intel PT case is at instruction granularity (instead of ksymbol
>> granularity)?
>
> Yes.
>
>> If this is true, modules, BPF, and PT could still share
>> the ksymbol record for basic profiling. And advanced use cases like
>> annotation will depend on user space to record BPF_EVENT (and equivalent
>> for other cases) timely. But at least, the ksymbol is already there.
>>
>> Does this make sense?
>
> I'm not sure I follow; the idea was that on ksym events we copy out the
> instructions using kcore. The ksym event already has addr+len.

I was thinking about modifying the text in-place scenario. In this case,
we can use something like

struct perf_record_text_modify {
u64 addr;
u_big_enough old_instr;
u_big_enough new_instr;
timestamp ;
};

It is a fixed size record, and we don't need process it immediately
in user space. At the end of perf run, a series of these events will
help us reconstruct exact text at any time.

>
> All we need is some means of ensuring the symbol is still there by the
> time we see the event and do the copy.
>
> I think we can do this with a new ioctl() on /proc/kcore itself:
>
> - when we have kcore open, we queue all text-free operations on list-1.
>
> - when we close kcore, we drain all (text-free) list-* and perform the
> pending frees immediately.
>
> - on ioctl(KCORE_QC) we perform the pending free of list-3 and advance
> list-2 to list-3 and list-1 to list-2.
>
> Perf would then open kcore at the start of the record, make a complete
> copy and keep the FD open. At the end of every buffer process, we issue
> KCORE_QC IFF we observed a ksym unreg in that buffer.

Does this mean we need to scan every buffer before writing it to perf.data
during perf-record?

Also, if we need ksym unreg here, I guess it is NOT really modifying text
in-place, but creating new version and swap? Then can we include something
like this in perf.data:

struct perf_record_text_modify {
u64 old_addr;
u64 new_addr;
u32 old_len; /* up to MAX_SIZE */
u32 new_len; /* up to MAX_SIZE */
u8 old_text[MAX_SIZE];
u8 new_text[MAX_SIZE];
timestamp ;
};

In this way, this record is embedded in perf.data, and doesn't require
extra processing during perf-record (only at the end of perf-record).
This would work for text modifying case, as modifying text is simply
old-text to new-text.

Similar solution would not work for BPF case, as bpf_prog_info is
getting a lot more members in the near future.

Does this make sense...?

Thanks,
Song