Re: [RFC PATCH v4 10/29] bpf tools: Collect map definitions from 'maps' section

From: Wangnan (F)
Date: Wed May 27 2015 - 23:10:52 EST




On 2015/5/28 10:28, Alexei Starovoitov wrote:
On Thu, May 28, 2015 at 10:03:04AM +0800, Wangnan (F) wrote:

On 2015/5/28 9:53, Alexei Starovoitov wrote:
On Wed, May 27, 2015 at 05:19:45AM +0000, Wang Nan wrote:
If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions. This
patch copies the data of the whole section. Map data parsing should be
acted just before map loading.

Signed-off-by: Wang Nan <wangnan0@xxxxxxxxxx>
---
...
+static int
+bpf_object__init_maps(struct bpf_object *obj, void *data,
+ size_t size)
+{
+ if (size == 0) {
+ pr_debug("%s doesn't need map definition\n",
+ obj->path);
+ return 0;
+ }
+
+ obj->maps_buf = malloc(size);
+ if (!obj->maps_buf) {
+ pr_warning("malloc maps failed: %s\n", obj->path);
+ return -ENOMEM;
+ }
+
+ obj->maps_buf_sz = size;
+ memcpy(obj->maps_buf, data, size);
why copy it? To create maps and apply fixups to instructions
relo sections are needed anyway, so elf has to be open while
this section is being processed. So why copy?

When creating maps, ELF file has been closed.

I divide libelf info two phases: opening and loading. ELF file is closed
at the end of opening phase. I think some caller need 'opening' phase only.
For example, checking metadata in an eBPF object file. In this case, we
don't
need create map file descriptors.
loading elf into memory, parsing it, copying map, prog, relo sections
just to check metadata? That doesn't sound like real use case.
imo it's cleaner to remember where maps and relocations are in a loaded elf,
then create maps, patch copied progs and release all elf.
This elfs are all very small, so we're not talking about large memory savings,
but still.


So do you suggest me to create maps in opening phase?

In bpf_object__open:

struct bpf_object *bpf_object__open(const char *path)
{
....
if (bpf_object__elf_init(obj))
goto out;

/* Real useful things put here */
....
/* Here we collect map information */
if (bpf_object__elf_collect(obj))
goto out;
....
/* And ELF file is closed here */
bpf_object__elf_finish(obj);
....
}

You can see that, after bpf_object__open() return we won't have chance
to access map data. Therefore we must create maps in bpf_object__open().

However this breaks a law in current design that opening phase doesn't
talk to kernel with sys_bpf() at all. All related staff is done in loading
phase. This principle ensures that in every systems, no matter it support
sys_bpf() or not, can read eBPF object without failure.

In fact I didn't separate opening and loading when I start working on libbpf.
However I soon found inconvenience that:
1. The uniform design doesn't allow users to adjust things before doing real work;
2. In my development environment I write code on a server without sys_bpf() support,
the uniform design prevent me to test my opening phase code. I have to test it
in QEMU.

In addition, this copying gives libbpf an ability that it can open once and
load - unload - load - unload many times without reopening and reparsing the
ELF file.

Moreover, we are planning to introduce hardware PMU to eBPF in the way like maps,
to give eBPF programs the ability to access hardware PMU counter. I haven't think
it thoroughly so I didn't discuss it with you and others. I think it should be
something like:

struct bpf_pmu {
/* attr of the hardware PMU which will be passed to perf_event_open to create an FD */
};

SEC("hw_pmu")
struct bpf_pmu cache_misses = {
...
};

SEC("lock_page=lock_page")
int lock_page_hook(struct pt_regs *ctx)
{
...
counter = bpf_read_pmu_counter(&cache_misses);
...
}

(My colleague Xia Kaixu is working on it. I append him to the CC list).
Creating that PMU FDs may require perf to adjust more things than programs and maps.
I believe that we shouldn't let libbpf to do its own without help from caller. Therefore
the separation of opening and loading should be required.

What do you think?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/