Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper

From: Namhyung Kim
Date: Tue Nov 01 2022 - 18:17:14 EST


Hi,

On Tue, Nov 1, 2022 at 1:04 PM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Tue, Nov 1, 2022 at 11:53 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Tue, Nov 1, 2022 at 11:47 AM Song Liu <song@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Nov 1, 2022 at 11:26 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> > > > > > The bpf_perf_event_read_sample() helper is to get the specified sample
> > > > > > data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> > > > > > decision for filtering on samples. Currently PERF_SAMPLE_IP and
> > > > > > PERF_SAMPLE_DATA flags are supported only.
> > > > > >
> > > > > > Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > > > > ---
> > > > > > include/uapi/linux/bpf.h | 23 ++++++++++++++++
> > > > > > kernel/trace/bpf_trace.c | 49 ++++++++++++++++++++++++++++++++++
> > > > > > tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> > > > > > 3 files changed, 95 insertions(+)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 94659f6b3395..cba501de9373 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -5481,6 +5481,28 @@ union bpf_attr {
> > > > > > * 0 on success.
> > > > > > *
> > > > > > * **-ENOENT** if the bpf_local_storage cannot be found.
> > > > > > + *
> > > > > > + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> > > > > > + * Description
> > > > > > + * For an eBPF program attached to a perf event, retrieve the
> > > > > > + * sample data associated to *ctx* and store it in the buffer
> > > > > > + * pointed by *buf* up to size *size* bytes.
> > > > > > + *
> > > > > > + * The *sample_flags* should contain a single value in the
> > > > > > + * **enum perf_event_sample_format**.
> > > > > > + * Return
> > > > > > + * On success, number of bytes written to *buf*. On error, a
> > > > > > + * negative value.
> > > > > > + *
> > > > > > + * The *buf* can be set to **NULL** to return the number of bytes
> > > > > > + * required to store the requested sample data.
> > > > > > + *
> > > > > > + * **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> > > > > > + *
> > > > > > + * **-ENOENT** if the associated perf event doesn't have the data.
> > > > > > + *
> > > > > > + * **-ENOSYS** if system doesn't support the sample data to be
> > > > > > + * retrieved.
> > > > > > */
> > > > > > #define ___BPF_FUNC_MAPPER(FN, ctx...) \
> > > > > > FN(unspec, 0, ##ctx) \
> > > > > > @@ -5695,6 +5717,7 @@ union bpf_attr {
> > > > > > FN(user_ringbuf_drain, 209, ##ctx) \
> > > > > > FN(cgrp_storage_get, 210, ##ctx) \
> > > > > > FN(cgrp_storage_delete, 211, ##ctx) \
> > > > > > + FN(perf_event_read_sample, 212, ##ctx) \
> > > > > > /* */
> > > > > >
> > > > > > /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > > > index ce0228c72a93..befd937afa3c 100644
> > > > > > --- a/kernel/trace/bpf_trace.c
> > > > > > +++ b/kernel/trace/bpf_trace.c
> > > > > > @@ -28,6 +28,7 @@
> > > > > >
> > > > > > #include <uapi/linux/bpf.h>
> > > > > > #include <uapi/linux/btf.h>
> > > > > > +#include <uapi/linux/perf_event.h>
> > > > > >
> > > > > > #include <asm/tlb.h>
> > > > > >
> > > > > > @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> > > > > > .arg4_type = ARG_ANYTHING,
> > > > > > };
> > > > > >
> > > > > > +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> > > > > > + void *, buf, u32, size, u64, flags)
> > > > > > +{
> > > > >
> > > > > I wonder we could add perf_btf (like we have tp_btf) program type that
> > > > > could access ctx->data directly without helpers
> > > > >
> > > > > > + struct perf_sample_data *sd = ctx->data;
> > > > > > + void *data;
> > > > > > + u32 to_copy = sizeof(u64);
> > > > > > +
> > > > > > + /* only allow a single sample flag */
> > > > > > + if (!is_power_of_2(flags))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + /* support reading only already populated info */
> > > > > > + if (flags & ~sd->sample_flags)
> > > > > > + return -ENOENT;
> > > > > > +
> > > > > > + switch (flags) {
> > > > > > + case PERF_SAMPLE_IP:
> > > > > > + data = &sd->ip;
> > > > > > + break;
> > > > > > + case PERF_SAMPLE_ADDR:
> > > > > > + data = &sd->addr;
> > > > > > + break;
> > > > >
> > > > > AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> > > > > directly from context right? same as sample_period.. so I think if this
> > > > > will be generic way to read sample data, should we add sample_period
> > > > > as well?
> > > >
> > > > +1
> > > > Let's avoid new stable helpers for this.
> > > > Pls use CORE and read perf_sample_data directly.
> > >
> > > We have legacy ways to access sample_period and addr with
> > > struct bpf_perf_event_data and struct bpf_perf_event_data_kern. I
> > > think mixing that
> > > with CORE makes it confusing for the user. And a helper or a kfunc would make it
> > > easier to follow. perf_btf might also be a good approach for this.
> >
> > imo that's a counter argument to non-CORE style.
> > struct bpf_perf_event_data has sample_period and addr,
> > and as soon as we pushed the boundaries it turned out it's not enough.
> > Now we're proposing to extend uapi a bit with sample_ip.
> > That will repeat the same mistake.
> > Just use CORE and read everything that is there today
> > and will be there in the future.
>
> Another work of this effort is that we need the perf_event to prepare
> required fields before calling the BPF program. I think we will need
> some logic in addition to CORE to get that right. How about we add
> perf_btf where the perf_event prepare all fields before calling the
> BPF program? perf_btf + CORE will be able to read all fields in the
> sample.

IIUC we want something like below to access sample data directly,
right?

BPF_CORE_READ(ctx, data, ip);

Some fields like raw and callchains will have variable length data
so it'd be hard to check the boundary at load time. Also it's possible
that some fields are not set (according to sample type), and it'd be
the user's (or programmer's) responsibility to check if the data is
valid. If these are not the concerns, I think I'm good.

Thanks,
Namhyung