Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

From: Wang Nan
Date: Tue May 05 2015 - 00:42:42 EST


On 2015/5/5 11:02, Alexei Starovoitov wrote:
> On 5/2/15 12:19 AM, Wang Nan wrote:
>>
>> I'd like to do following works in the next version (based on my experience and feedbacks):
>>
>> 1. Safely clean up kprobe points after unloading;
>>
>> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load';
>>
>> 3. Extract eBPF ELF walking and collecting work to a separated library to help others.
>
> that's a good list.
>
> The feedback for existing patches:
> patch 18 - since we're creating a generic library for bpf elf
> loading it would great to do the following:
> first try to load with
> attr.log_buf = NULL;
> attr.log_level = 0;
> then only if it fails, allocate a buffer and repeat with log_level = 1.
> The reason is that it's better to have fast program loading by default
> without any verbosity emitted by verifier.
>

Will do.

> patch 19 - I think it's unnecessary.
> verifier already dumps it. so this '-v' flag can be translated into
> verbose loading.
> There is also .s output from llvm for those interested in bpf asm
> instructions.
>

That's great. Could you please append the description of 'llvm -s' into your README
or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
add it into perf...

>> My collage He Kuang is working on variable accessing. Probing inside function body
>> and accessing its local variable will be supported like this:
>>
>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>> int prog(struct pt_regs *ctx, unsigned long vara) {
>> // vara is the value of localvara of function func_name
>> }
>
> that would be great. I'm not sure though how you can achieve that
> without changing C front-end ?

It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
>From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.

Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.

> This type of feature is exactly the reason why we're trying to write
> our front-end.
> In general there are two ways to achieve 'restricted C' language:
> - start from clang and chop all features that are not supported.
> I believe Jovi already tried to do that and it became very difficult.
> - start from simple front-end with minimal C and add all things one by
> one. That's what we're trying to do. So far we have most of normal
> syntax. The problem with our approach is that we cannot easily do
> #include of existing .h files. We're working on that.
> It's too experimental still. May be will be drop it and go back to
> first approach.
>
> The reason for extending front-end is your example above, where
> the user would want to write:
> int prog(struct pt_regs *ctx, unsigned long vara) {
> // use 'vara'
> but generated BPF should have only one 'ctx' pointer, since that's
> the only thing that verifier will accept. bpf/core and JITs expect
> only one argument, etc.
> So this func definition + 'vara' access can be compiled as ctx->si
> (if vara is actually in register) or
> bpf_probe_read(ctx->bp + magic_offset_from_debug_info)
> (if vara is on stack)
> or it can also be done via store_trace_args() but that will be slower
> and requires hacking kernel, whereas ctx->... style is pure userspace.
> Lot's of things to brainstorm. So please share your progress soon.
>
>> And I want to discuss with you and others about:
>>
>> 1. How to make eBPF output its tracing and aggregation results to perf?
>
> well, the output of bpf program is a data stored in maps. Each program
> needs a corresponding user space reader/printer/sorter of this data.
> Like tracex2 prints this data as histogram and tracex3 prints it as
> heatmap. We can standardize few things like this, but ideally we
> keep it up to user. So that user can write single file that consists
> of functions that are loaded as bpf into kernel and other functions
> that are executed in user space. llvm can jit first set to bpf and
> second set to x86. That's distant future though.
> So far samples/bpf/ style of kern.c+user.c worked quite well.
>

Well, looks like in your design the usage of BPF programs are some aggration
results. In my side, I want they also ack as trace filters.

Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
on the entry of __unlock_page(), so we can compute the interval between page locking and
unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
so we get its call stack. In this case, eBPF program acts as a trace filter.

Thank you.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/