[PATCH net-next v2 0/4] net: route: improve route hinting

From: Leone Fernando
Date: Tue May 07 2024 - 08:43:15 EST


In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
sub-system. The hinting optimization improves performance by reusing
previously found dsts instead of looking them up for each skb.

This patch series introduces a generalized version of the hinting mechanism that
can "remember" a larger number of dsts. This reduces the number of dst
lookups for frequently encountered daddrs.

Before diving into the code and the benchmarking results, it's important
to address the deletion of the old route cache [2] and why
this solution is different. The original cache was complicated,
vulnerable to DOS attacks and had unstable performance.

The new input dst_cache is much simpler thanks to its lazy approach,
improving performance without the overhead of the removed cache
implementation. Instead of using timers and GC, the deletion of invalid
entries is performed lazily during their lookups.
The dsts are stored in a simple, lightweight, static hash table. This
keeps the lookup times fast yet stable, preventing DOS upon cache misses.
The new input dst_cache implementation is built over the existing
dst_cache code which supplies a fast lockless percpu behavior.

The measurement setup is comprised of 2 machines with mlx5 100Gbit NIC.
I sent small UDP packets with 5000 daddrs (10x of cache size) from one
machine to the other while also varying the saddr and the tos. I set
an iptables rule to drop the packets after routing. the receiving
machine's CPU (i9) was saturated.

Thanks a lot to David Ahern for all the help and guidance!

I measured the rx PPS using ifpps and the per-queue PPS using ethtool -S.
These are the results:

Total PPS:
mainline patched delta
Kpps Kpps %
6903 8105 17.41

Per-Queue PPS:
Queue mainline patched
0 345775 411780
1 345252 414387
2 347724 407501
3 346232 413456
4 347271 412088
5 346808 400910
6 346243 406699
7 346484 409104
8 342731 404612
9 344068 407558
10 345832 409558
11 346296 409935
12 346900 399084
13 345980 404513
14 347244 405136
15 346801 408752
16 345984 410865
17 346632 405752
18 346064 407539
19 344861 408364
total 6921182 8157593

I also verified that the number of packets caught by the iptables rule
matches the measured PPS.

TCP throughput was not affected by the patch, below is iperf3 output:
mainline patched
15.4 GBytes 13.2 Gbits/sec 15.5 GBytes 13.2 Gbits/sec

[1] https://lore.kernel.org/netdev/cover.1574252982.git.pabeni@xxxxxxxxxx/
[2] https://lore.kernel.org/netdev/20120720.142502.1144557295933737451.davem@xxxxxxxxxxxxx/

v1->v2:
- fix bitwise cast warning
- improved measurements setup

v1:
- fix typo while allocating per-cpu cache
- while using dst from the dst_cache set IPSKB_DOREDIRECT correctly
- always compile dst_cache

RFC-v2:
- remove unnecessary macro
- move inline to .h file

RFC-v1: https://lore.kernel.org/netdev/d951b371-4138-4bda-a1c5-7606a28c81f0@xxxxxxxxx/
RFC-v2: https://lore.kernel.org/netdev/3a17c86d-08a5-46d2-8622-abc13d4a411e@xxxxxxxxx/

Leone Fernando (4):
net: route: expire rt if the dst it holds is expired
net: dst_cache: add input_dst_cache API
net: route: always compile dst_cache
net: route: replace route hints with input_dst_cache

drivers/net/Kconfig | 1 -
include/net/dst_cache.h | 68 +++++++++++++++++++
include/net/dst_metadata.h | 2 -
include/net/ip_tunnels.h | 2 -
include/net/route.h | 6 +-
net/Kconfig | 4 --
net/core/Makefile | 3 +-
net/core/dst.c | 4 --
net/core/dst_cache.c | 132 +++++++++++++++++++++++++++++++++++++
net/ipv4/Kconfig | 1 -
net/ipv4/ip_input.c | 58 ++++++++--------
net/ipv4/ip_tunnel_core.c | 4 --
net/ipv4/route.c | 75 +++++++++++++++------
net/ipv4/udp_tunnel_core.c | 4 --
net/ipv6/Kconfig | 4 --
net/ipv6/ip6_udp_tunnel.c | 4 --
net/netfilter/nft_tunnel.c | 2 -
net/openvswitch/Kconfig | 1 -
net/sched/act_tunnel_key.c | 2 -
19 files changed, 291 insertions(+), 86 deletions(-)

--
2.34.1