Re: [PATCH bpf-next] bpf: Add bpf_read_raw_record() helper

From: Song Liu
Date: Fri Aug 26 2022 - 14:48:31 EST




> On Aug 26, 2022, at 11:09 AM, Song Liu <songliubraving@xxxxxx> wrote:
>
>
>
>> On Aug 26, 2022, at 9:33 AM, Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>>
>> On Thu, Aug 25, 2022 at 10:53 PM Song Liu <song@xxxxxxxxxx> wrote:
>>>
>>> On Thu, Aug 25, 2022 at 10:22 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>>>>
>>>> On Thu, Aug 25, 2022 at 7:35 PM Song Liu <songliubraving@xxxxxx> wrote:
>>>>> Actually, since we are on this, can we make it more generic, and handle
>>>>> all possible PERF_SAMPLE_* (in enum perf_event_sample_format)? Something
>>>>> like:
>>>>>
>>>>> long bpf_perf_event_read_sample(void *ctx, void *buf, u64 size, u64 flags);
>>>>>
>>>>> WDYT Namhyung?
>>>>
>>>> Do you mean reading the whole sample data at once?
>>>> Then it needs to parse the sample data format properly
>>>> which is non trivial due to a number of variable length
>>>> fields like callchains and branch stack, etc.
>>>>
>>>> Also I'm afraid I might need event configuration info
>>>> other than sample data like attr.type, attr.config,
>>>> attr.sample_type and so on.
>>>>
>>>> Hmm.. maybe we can add it to the ctx directly like ctx.attr_type?
>>>
>>> The user should have access to the perf_event_attr used to
>>> create the event. This is also available in ctx->event->attr.
>>
>> Do you mean from BPF? I'd like to have a generic BPF program
>> that can handle various filtering according to the command line
>> arguments. I'm not sure but it might do something differently
>> for each event based on the attr settings.
>
> Yeah, we can access perf_event_attr from BPF program. Note that
> the ctx for perf_event bpf program is struct bpf_perf_event_data_kern:
>
> SEC("perf_event")
> int perf_e(struct bpf_perf_event_data_kern *ctx)
> {
> ...
> }
>
> struct bpf_perf_event_data_kern {
> bpf_user_pt_regs_t *regs;
> struct perf_sample_data *data;
> struct perf_event *event;
> };
>
> Alternatively, we can also have bpf user space configure the BPF
> program via a few knobs.
>
> And actually, we can just read ctx->data and get the raw record,
> right..?

Played with this for a little bit. ctx->data appears to be not
reliable sometimes. I guess (not 100% sure) this is because we
call bpf program before event->orig_overflow_handler. We can
probably add a flag to specify we want to call orig_overflow_handler
first.

Thanks,
Song