Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

From: Song Liu
Date: Fri Dec 14 2018 - 12:13:06 EST




> On Dec 14, 2018, at 5:48 AM, Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>
> Em Thu, Dec 13, 2018 at 09:48:57PM +0000, Song Liu escreveu:
>>
>>
>>> On Dec 13, 2018, at 10:45 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Dec 12, 2018 at 01:33:20PM -0500, Steven Rostedt wrote:
>>>> On Wed, 12 Dec 2018 19:05:53 +0100
>>>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>>
>>>>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>>>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>>>>>>
>>>>>>> .... Oooh, I see the problem, everybody is doing their own custom
>>>>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>>>>>>
>>>>>>> This, for example, doesn't track module load/unload nor ftrace
>>>>>>> trampolines, even though both affect kallsyms.
>>>>>>
>>>>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload.
>>>>>> That could be separate sets of patches.
>>>>>
>>>>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
>>>>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
>>>>> and also have ftrace use that.
>>>>>
>>>>> Because currently the ftrace stuff is otherwise invisible.
>>>>>
>>>>> A generic kallsym register/unregister for any JIT.
>>>>
>>>> That's if it needs to look up the symbols that were recorded when init
>>>> was unloaded.
>>>>
>>>> The ftrace kallsyms is used to save the function names of init code
>>>> that was freed, but may have been recorded. With out the ftrace
>>>> kallsyms the functions traced at init time would just show up as hex
>>>> addresses (not very useful).
>>>>
>>>> I'm not sure how BPF would need those symbols unless they were executed
>>>> during init (module or core) and needed to see what the symbols use to
>>>> be).
>>>
>>> Aah, that sounds entirely dodgy and possibly quite broken. We freed that
>>> init code, so BPF or your trampolines (or a tiny module) could actually
>>> fit in there and insert their own kallsyms, and then we have overlapping
>>> symbols, which would be pretty bad.
>>>
>>> I thought the ftrace kallsym stuff was for the trampolines, which would
>>> be fairly similar to what BPF is doing. And why I'm trying to get a
>>> generic dynamic kallsym thing sorted. There's bound the be other
>>> code-gen things at some point.
>>
>> Hi Peter,
>>
>> I guess you are looking for something for all ksym add/delete events, like;
>>
>> /*
>> * PERF_RECORD_KSYMBOL
>> *
>> * struct {
>> * struct perf_event_header header;
>> * u64 addr;
>> * u32 len;
>> * u16 ksym_type;
>> * u16 flags;
>> * char name[];
>> * struct sample_id sample_id;
>> * };
>> */
>
> Can't this reuse PERF_RECORD_MMAP2 with some bit in the header to mean
> that the name is the symbol name, not a path to some ELF/whatever? The
> ksym type could be encoded in the prot field, PROT_EXEC for functions,
> PROT_READ for read only data, PROT_WRITE for rw data.

Thanks Arnaldo!

I think this works. PERF_RECORD_MMAP2 has many bits in it. We can encode
a lot of details. We can even have bit to differentiate map/unmap.

>
> If we do it that way older tools will show the DSO name and an
> unresolved symbol, and even an indication if its a function or data,
> which is better than not showing anything when processing a new
> PERF_RECORD_KSYMBOL.

For compatibility, we can use attr.bpf_event bit (or attr.mmap2_plus)
to turn on/off new variations of PERF_RECORD_MMAP2. Unless user runs
perf-record and perf-report with different versions of perf tools, we
should not see weird events.

>
> New tools, seeing the perf_event_attr.header bit will know that this is
> a "map" with just one symbol and will show that for both DSO name and
> symbol.
>

Hi Peter,

Could you please share your comments/suggestions on Arnaldo's proposal?

Thanks,
Song