Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

From: Song Liu
Date: Thu Dec 13 2018 - 11:08:44 EST




> On Dec 13, 2018, at 7:25 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Dec 12, 2018 at 06:56:11PM +0000, Song Liu wrote:
>>
>>
>>> On Dec 12, 2018, at 10:05 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>>>>
>>>>> .... Oooh, I see the problem, everybody is doing their own custom
>>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>>>>
>>>>> This, for example, doesn't track module load/unload nor ftrace
>>>>> trampolines, even though both affect kallsyms.
>>>>
>>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload.
>>>> That could be separate sets of patches.
>>>
>>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
>>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
>>> and also have ftrace use that.
>>>
>>> Because currently the ftrace stuff is otherwise invisible.
>>>
>>> A generic kallsym register/unregister for any JIT.
>>
>> I guess this is _not_ a requirement for this patchset? BPF program has
>> special data (id, sub_id, tag) that we need PERF_RECORD_BPF_EVENT. So
>> this patchset should be orthogonal to the generic kallsym framework?
>
> Well, it is a question of ABI. I don't like mixing the kallsym updates
> with the BPF updates.

I have been always thinking the two is one update: "mapping this BPF
program to this ksym".

On the other hand, if we really want to separate the two. I guess we
need two PERF_RECORD_*:

/*
* PERF_RECORD_KSYM_ADD/DEL or MMAP3/MUNMAP3
*
* struct {
* struct perf_event_header header;
* u64 addr;
* u64 len;
* char name[];
* struct sample_id sample_id;
* };
*/

and

/*
* PERF_RECORD_BPF_EVENT
*
* struct {
* struct perf_event_header header;
* u32 type;
* u32 flags;
* u32 id; // prog_id or other id
* u32 sub_id; // subprog id
*
* // for bpf_prog types, bpf prog or subprog
* u8 tag[BPF_TAG_SIZE];
* struct sample_id sample_id;
* };
*/

In this case, PERF_RECORD_BPF_EVENT is only needed when user want
annotation. When annotation is needed, kernel will generate both
record for each BPF program load/unload. Then, user space will do
some work to match the two.

Personally, I think this is not as clean as current version. But
it would work.

Would you recommend we go on this direction?

Thanks,
Song