On Tue, May 24, 2022 at 12:53 AM Feng zhou <zhoufeng.zf@xxxxxxxxxxxxx> wrote:
+static void setup(void)...
+{
+ struct bpf_link *link;
+ int map_fd, i, max_entries;
+
+ setup_libbpf();
+
+ ctx.skel = bpf_map_bench__open_and_load();
+ if (!ctx.skel) {
+ fprintf(stderr, "failed to open skeleton\n");
+ exit(1);
+ }
+
+ link = bpf_program__attach(ctx.skel->progs.benchmark);
+ if (!link) {
+ fprintf(stderr, "failed to attach program!\n");
+ exit(1);
+ }
+
+ //fill hash_map
+ map_fd = bpf_map__fd(ctx.skel->maps.hash_map_bench);
+ max_entries = bpf_map__max_entries(ctx.skel->maps.hash_map_bench);
+ for (i = 0; i < max_entries; i++)
+ bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
+}
+SEC("fentry/" SYS_PREFIX "sys_getpgid")
+int benchmark(void *ctx)This benchmark is artificial at its extreme.
+{
+ u32 key = bpf_get_prandom_u32();
+ u64 init_val = 1;
+
+ bpf_map_update_elem(&hash_map_bench, &key, &init_val, BPF_ANY);
+ return 0;
+}
First it populates the map till max_entries and then
constantly bounces off the max_entries limit in a bpf prog.
Sometimes random_u32 will be less than max_entries
and map_update_elem will hit a fast path,
but most of the time it will fail to alloc_htab_elem()
and will fail to map_update_elem.
It does demonstrate that percpu_free_list is inefficient
when it's empty, but there is no way such a microbenchmark
justifies optimizing this corner case.
If there is a production use case please code it up in
a benchmark.
Also there is a lot of other overhead: syscall and atomic-s.
To stress map_update_elem please use a for() loop inside bpf prog.