Re: [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty

From: Yonghong Song
Date: Sat Aug 01 2020 - 12:59:58 EST




On 7/31/20 9:57 PM, Brian Vazquez wrote:
While running some experiments it was observed that map_lookup_batch was much
slower than get_next_key + lookup when the syscall overhead is minimal.
This was because the map_lookup_batch implementation was more expensive
traversing empty buckets, this can be really costly when the pre-allocated
map is too big.

This patch optimizes the case when the bucket is empty so we can move quickly
to next bucket.

The benchmark to exercise this is as follows:

-The map was populate with a single entry to make sure that the syscall overhead
is not helping the map_batch_lookup.
-The size of the preallocated map was increased to show the effect of
traversing empty buckets.

Results:

Using get_next_key + lookup:

Benchmark Time(ns) CPU(ns) Iteration
---------------------------------------------------------------
BM_DumpHashMap/1/1k 3593 3586 192680
BM_DumpHashMap/1/4k 6004 5972 100000
BM_DumpHashMap/1/16k 15755 15710 44341
BM_DumpHashMap/1/64k 59525 59376 10000

I think "BM_DumpHashMap/1/64k" means the program "BM_DumpHashMap",
the map having only "1" entry, and the map preallocated size is "64k"?
What is the "Iteration" here? The number of runs with the same dump?
The CPU(ns) is the system cpu consumption, right? The Time/CPU is for
all iterations, not just one, right? It would be good
if the above results can be described better, so people can
understand the results better.


Using htab_lookup_batch before this patch:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------------------
BM_DumpHashMap/1/1k 3933 3927 177978
BM_DumpHashMap/1/4k 9192 9177 73951
BM_DumpHashMap/1/16k 42011 41970 16789
BM_DumpHashMap/1/64k 117895 117661 6135

Using htab_lookup_batch with this patch:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------------------
BM_DumpHashMap/1/1k 2809 2803 249212
BM_DumpHashMap/1/4k 5318 5316 100000
BM_DumpHashMap/1/16k 14925 14895 47448
BM_DumpHashMap/1/64k 58870 58674 10000

Suggested-by: Luigi Rizzo <lrizzo@xxxxxxxxxx>
Cc: Yonghong Song <yhs@xxxxxx>
Signed-off-by: Brian Vazquez <brianvv@xxxxxxxxxx>
---
kernel/bpf/hashtab.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 2137e2200d95..150015ea6737 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
struct hlist_nulls_head *head;
struct hlist_nulls_node *n;
unsigned long flags = 0;
- bool locked = false;
struct htab_elem *l;
struct bucket *b;
int ret = 0;
@@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
dst_val = values;
b = &htab->buckets[batch];
head = &b->head;
- /* do not grab the lock unless need it (bucket_cnt > 0). */
- if (locked)
- flags = htab_lock_bucket(htab, b);
+ l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
+ struct htab_elem, hash_node);
+ if (!l && (batch + 1 < htab->n_buckets)) {
+ batch++;
+ goto again_nocopy;
+ }
+
+ flags = htab_lock_bucket(htab, b);
[...]