Re: [PATCH net-next v2 0/4] net: route: improve route hinting

From: Eric Dumazet
Date: Tue May 07 2024 - 09:08:33 EST


On Tue, May 7, 2024 at 2:43 PM Leone Fernando <leone4fernando@gmailcom> wrote:
>
> In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
> sub-system. The hinting optimization improves performance by reusing
> previously found dsts instead of looking them up for each skb.
>
> This patch series introduces a generalized version of the hinting mechanism that
> can "remember" a larger number of dsts. This reduces the number of dst
> lookups for frequently encountered daddrs.
>
> Before diving into the code and the benchmarking results, it's important
> to address the deletion of the old route cache [2] and why
> this solution is different. The original cache was complicated,
> vulnerable to DOS attacks and had unstable performance.
>
> The new input dst_cache is much simpler thanks to its lazy approach,
> improving performance without the overhead of the removed cache
> implementation. Instead of using timers and GC, the deletion of invalid
> entries is performed lazily during their lookups.
> The dsts are stored in a simple, lightweight, static hash table. This
> keeps the lookup times fast yet stable, preventing DOS upon cache misses.
> The new input dst_cache implementation is built over the existing
> dst_cache code which supplies a fast lockless percpu behavior.
>
> The measurement setup is comprised of 2 machines with mlx5 100Gbit NIC.
> I sent small UDP packets with 5000 daddrs (10x of cache size) from one
> machine to the other while also varying the saddr and the tos. I set
> an iptables rule to drop the packets after routing. the receiving
> machine's CPU (i9) was saturated.
>
> Thanks a lot to David Ahern for all the help and guidance!
>
> I measured the rx PPS using ifpps and the per-queue PPS using ethtool -S.
> These are the results:

How device dismantles are taken into account ?

I am currently tracking a bug in dst_cache, triggering sometimes when
running pmtu.sh selftest.

Apparently, dst_cache_per_cpu_dst_set() can cache dst that have no
dst->rt_uncached
linkage.

There is no cleanup (at least in vxlan) to make sure cached dst are
either freed or
their dst->dev changed.


TEST: ipv6: cleanup of cached exceptions - nexthop objects [ OK ]
[ 1001.344490] vxlan: __vxlan_fdb_free calling
dst_cache_destroy(ffff8f12422cbb90)
[ 1001.345253] dst_cache_destroy dst_cache=ffff8f12422cbb90
->cache=0000417580008d30
[ 1001.378615] vxlan: __vxlan_fdb_free calling
dst_cache_destroy(ffff8f12471e31d0)
[ 1001.379260] dst_cache_destroy dst_cache=ffff8f12471e31d0
->cache=0000417580008608
[ 1011.349730] unregister_netdevice: waiting for veth_A-R1 to become
free. Usage count = 7
[ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 1/6 users at
[ 1011.350562] dst_alloc+0x76/0x160
[ 1011.350562] ip6_dst_alloc+0x25/0x80
[ 1011.350562] ip6_pol_route+0x2a8/0x450
[ 1011.350562] ip6_pol_route_output+0x1f/0x30
[ 1011.350562] fib6_rule_lookup+0x163/0x270
[ 1011.350562] ip6_route_output_flags+0xda/0x190
[ 1011.350562] ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
[ 1011.350562] ip6_dst_lookup_flow+0x47/0xa0
[ 1011.350562] udp_tunnel6_dst_lookup+0x158/0x210
[ 1011.350562] vxlan_xmit_one+0x4c6/0x1550 [vxlan]
[ 1011.350562] vxlan_xmit+0x535/0x1500 [vxlan]
[ 1011.350562] dev_hard_start_xmit+0x7b/0x1e0
[ 1011.350562] __dev_queue_xmit+0x20c/0xe40
[ 1011.350562] arp_xmit+0x1d/0x50
[ 1011.350562] arp_send_dst+0x7f/0xa0
[ 1011.350562] arp_solicit+0xf6/0x2f0
[ 1011.350562]
[ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 3/6 users at
[ 1011.350562] dst_alloc+0x76/0x160
[ 1011.350562] ip6_dst_alloc+0x25/0x80
[ 1011.350562] ip6_pol_route+0x2a8/0x450
[ 1011.350562] ip6_pol_route_output+0x1f/0x30
[ 1011.350562] fib6_rule_lookup+0x163/0x270
[ 1011.350562] ip6_route_output_flags+0xda/0x190
[ 1011.350562] ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
[ 1011.350562] ip6_dst_lookup_flow+0x47/0xa0
[ 1011.350562] udp_tunnel6_dst_lookup+0x158/0x210
[ 1011.350562] vxlan_xmit_one+0x4c6/0x1550 [vxlan]
[ 1011.350562] vxlan_xmit+0x535/0x1500 [vxlan]
[ 1011.350562] dev_hard_start_xmit+0x7b/0x1e0
[ 1011.350562] __dev_queue_xmit+0x20c/0xe40
[ 1011.350562] ip6_finish_output2+0x2ea/0x6e0
[ 1011.350562] ip6_finish_output+0x143/0x320
[ 1011.350562] ip6_output+0x74/0x140
[ 1011.350562]
[ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 1/6 users at
[ 1011.350562] netdev_get_by_index+0xc0/0xe0
[ 1011.350562] fib6_nh_init+0x1a9/0xa90
[ 1011.350562] rtm_new_nexthop+0x6fa/0x1580
[ 1011.350562] rtnetlink_rcv_msg+0x155/0x3e0
[ 1011.350562] netlink_rcv_skb+0x61/0x110
[ 1011.350562] rtnetlink_rcv+0x19/0x20
[ 1011.350562] netlink_unicast+0x23f/0x380
[ 1011.350562] netlink_sendmsg+0x1fc/0x430
[ 1011.350562] ____sys_sendmsg+0x2ef/0x320
[ 1011.350562] ___sys_sendmsg+0x86/0xd0
[ 1011.350562] __sys_sendmsg+0x67/0xc0
[ 1011.350562] __x64_sys_sendmsg+0x21/0x30
[ 1011.350562] x64_sys_call+0x252/0x2030
[ 1011.350562] do_syscall_64+0x6c/0x190
[ 1011.350562] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1011.350562]
[ 1011.350562] ref_tracker: veth_A-R1@000000009392ed3b has 1/6 users at
[ 1011.350562] ipv6_add_dev+0x136/0x530
[ 1011.350562] addrconf_notify+0x19d/0x770
[ 1011.350562] notifier_call_chain+0x65/0xd0
[ 1011.350562] raw_notifier_call_chain+0x1a/0x20
[ 1011.350562] call_netdevice_notifiers_info+0x54/0x90
[ 1011.350562] register_netdevice+0x61e/0x790
[ 1011.350562] veth_newlink+0x230/0x440
[ 1011.350562] __rtnl_newlink+0x7d2/0xaa0
[ 1011.350562] rtnl_newlink+0x4c/0x70
[ 1011.350562] rtnetlink_rcv_msg+0x155/0x3e0
[ 1011.350562] netlink_rcv_skb+0x61/0x110
[ 1011.350562] rtnetlink_rcv+0x19/0x20
[ 1011.350562] netlink_unicast+0x23f/0x380
[ 1011.350562] netlink_sendmsg+0x1fc/0x430
[ 1011.350562] ____sys_sendmsg+0x2ef/0x320
[ 1011.350562] ___sys_sendmsg+0x86/0xd0
[ 1011.350562]