Re: [RFC v2] net/core: add optional threading for rps backlog processing

From: Felix Fietkau
Date: Fri Feb 17 2023 - 08:41:05 EST


On 17.02.23 13:57, Eric Dumazet wrote:
On Fri, Feb 17, 2023 at 1:35 PM Felix Fietkau <nbd@xxxxxxxx> wrote:

On 17.02.23 13:23, Eric Dumazet wrote:
> On Fri, Feb 17, 2023 at 11:06 AM Felix Fietkau <nbd@xxxxxxxx> wrote:
>>
>> When dealing with few flows or an imbalance on CPU utilization, static RPS
>> CPU assignment can be too inflexible. Add support for enabling threaded NAPI
>> for RPS backlog processing in order to allow the scheduler to better balance
>> processing. This helps better spread the load across idle CPUs.
>>
>> Signed-off-by: Felix Fietkau <nbd@xxxxxxxx>
>> ---
>>
>> RFC v2:
>> - fix rebase error in rps locking
>
> Why only deal with RPS ?
>
> It seems you propose the sofnet_data backlog be processed by a thread,
> instead than from softirq ?
Right. I originally wanted to mainly improve RPS, but my patch does
cover backlog in general. I will update the description in the next
version. Does the approach in general make sense to you?


I do not know, this seems to lack some (perf) numbers, and
descriptions of added max latencies and stuff like that :)
I just ran some test where I used a MT7621 device (dual-core 800 MHz
MIPS, 4 threads) as a router doing NAT without flow offloading.

Using the flent RRUL test between 2 PCs connected through the router,
I get these results:

rps_threaded=0: (combined CPU idle time around 27%)
avg median 99th % # data pts
Ping (ms) ICMP : 26.08 28.70 54.74 ms 199
Ping (ms) UDP BE : 1.96 24.12 37.28 ms 200
Ping (ms) UDP BK : 1.88 15.86 27.30 ms 200
Ping (ms) UDP EF : 1.98 31.77 54.10 ms 200
Ping (ms) avg : 1.94 N/A N/A ms 200
TCP download BE : 69.25 70.20 139.55 Mbits/s 200
TCP download BK : 95.15 92.51 163.93 Mbits/s 200
TCP download CS5 : 133.64 129.10 292.46 Mbits/s 200
TCP download EF : 129.86 127.70 254.47 Mbits/s 200
TCP download avg : 106.97 N/A N/A Mbits/s 200
TCP download sum : 427.90 N/A N/A Mbits/s 200
TCP totals : 864.43 N/A N/A Mbits/s 200
TCP upload BE : 97.54 96.67 163.99 Mbits/s 200
TCP upload BK : 139.76 143.88 190.37 Mbits/s 200
TCP upload CS5 : 97.52 94.70 206.60 Mbits/s 200
TCP upload EF : 101.71 106.72 147.88 Mbits/s 200
TCP upload avg : 109.13 N/A N/A Mbits/s 200
TCP upload sum : 436.53 N/A N/A Mbits/s 200

rps_threaded=1: (combined CPU idle time around 16%)
avg median 99th % # data pts
Ping (ms) ICMP : 13.70 16.10 27.60 ms 199
Ping (ms) UDP BE : 2.03 18.35 24.16 ms 200
Ping (ms) UDP BK : 2.03 18.36 29.13 ms 200
Ping (ms) UDP EF : 2.36 25.20 41.50 ms 200
Ping (ms) avg : 2.14 N/A N/A ms 200
TCP download BE : 118.69 120.94 160.12 Mbits/s 200
TCP download BK : 134.67 137.81 177.14 Mbits/s 200
TCP download CS5 : 126.15 127.81 174.84 Mbits/s 200
TCP download EF : 78.36 79.41 143.31 Mbits/s 200
TCP download avg : 114.47 N/A N/A Mbits/s 200
TCP download sum : 457.87 N/A N/A Mbits/s 200
TCP totals : 918.19 N/A N/A Mbits/s 200
TCP upload BE : 112.20 111.55 164.38 Mbits/s 200
TCP upload BK : 144.99 139.24 205.12 Mbits/s 200
TCP upload CS5 : 93.09 95.50 132.39 Mbits/s 200
TCP upload EF : 110.04 108.21 207.00 Mbits/s 200
TCP upload avg : 115.08 N/A N/A Mbits/s 200
TCP upload sum : 460.32 N/A N/A Mbits/s 200

As you can see, both throughput and latency improve because load can be
better distributed across CPU cores.

- Felix