Re: [External] Re: [PATCH v2 2/2] selftest/bpf/benchs: Add bpf_map benchmark

From: Feng Zhou
Date: Wed May 25 2022 - 22:41:01 EST


在 2022/5/25 上午8:13, Alexei Starovoitov 写道:
On Tue, May 24, 2022 at 12:53 AM Feng zhou <zhoufeng.zf@xxxxxxxxxxxxx> wrote:
+static void setup(void)
+{
+ struct bpf_link *link;
+ int map_fd, i, max_entries;
+
+ setup_libbpf();
+
+ ctx.skel = bpf_map_bench__open_and_load();
+ if (!ctx.skel) {
+ fprintf(stderr, "failed to open skeleton\n");
+ exit(1);
+ }
+
+ link = bpf_program__attach(ctx.skel->progs.benchmark);
+ if (!link) {
+ fprintf(stderr, "failed to attach program!\n");
+ exit(1);
+ }
+
+ //fill hash_map
+ map_fd = bpf_map__fd(ctx.skel->maps.hash_map_bench);
+ max_entries = bpf_map__max_entries(ctx.skel->maps.hash_map_bench);
+ for (i = 0; i < max_entries; i++)
+ bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
+}
...
+SEC("fentry/" SYS_PREFIX "sys_getpgid")
+int benchmark(void *ctx)
+{
+ u32 key = bpf_get_prandom_u32();
+ u64 init_val = 1;
+
+ bpf_map_update_elem(&hash_map_bench, &key, &init_val, BPF_ANY);
+ return 0;
+}
This benchmark is artificial at its extreme.
First it populates the map till max_entries and then
constantly bounces off the max_entries limit in a bpf prog.
Sometimes random_u32 will be less than max_entries
and map_update_elem will hit a fast path,
but most of the time it will fail to alloc_htab_elem()
and will fail to map_update_elem.

It does demonstrate that percpu_free_list is inefficient
when it's empty, but there is no way such a microbenchmark
justifies optimizing this corner case.

If there is a production use case please code it up in
a benchmark.

This corner case is not easy to reproduce. In the scenario of a surge in network traffic,
the map is full, and there are still a large number of update operations.
Just like Yonghong Song says
'''
in your use case, you have lots of different keys and your intention is NOT to capture all the keys in
the hash table. So given a hash table, it is possible that the hash
will become full even if you increase the hashtable size.
Maybe you will occasionally delete some keys which will free some
space but the space will be quickly occupied by the new updates.
'''


Also there is a lot of other overhead: syscall and atomic-s.
To stress map_update_elem please use a for() loop inside bpf prog.

Ok, I will modify the way the test case is tested.
And add this benchmark just to reproduce the case. As for whether to optimize this case, I use ftrace
'''
cd /sys/kernel/debug/tracing/
echo > trace
echo htab_map_update_elem > set_graph_function
echo function_graph > current_tracer
cat per_cpu/cpu0/trace
echo nop > current_tracer
'''
To confirm whether the update operation will continue to grab the spin-lock of each cpu when the map is full.
Then close ftrace, check the time-consuming update and whether there is any improvement before patching.