Re: [PATCH net-next] net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overhead

From: Tariq Toukan
Date: Fri May 16 2025 - 09:49:42 EST




On 15/05/2025 3:26, Alexei Starovoitov wrote:
On Wed, May 14, 2025 at 1:04 PM Tariq Toukan <tariqt@xxxxxxxxxx> wrote:

From: Carolina Jubran <cjubran@xxxxxxxxxx>

CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by
zero-initializing all stack variables on function entry. The mlx5 XDP
RX path previously allocated a struct mlx5e_xdp_buff on the stack per
received CQE, resulting in measurable performance degradation under
this config.

This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct,
avoiding per-CQE stack allocations and repeated zeroing.

With this change, XDP_DROP and XDP_TX performance matches that of
kernels built without CONFIG_INIT_STACK_ALL_ZERO.

Performance was measured on a ConnectX-6Dx using a single RX channel
(1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from
net-next-6.15.

Stack zeroing disabled:
- XDP_DROP:
* baseline: 31.47 Mpps
* baseline + per-RQ allocation: 32.31 Mpps (+2.68%)

- XDP_TX:
* baseline: 12.41 Mpps
* baseline + per-RQ allocation: 12.95 Mpps (+4.30%)

Looks good, but where are these gains coming from ?
The patch just moves mxbuf from stack to rq.
The number of operations should really be the same.


I guess it's cache related. Hot/cold areas, alignments, movement of other fields in the mlx5e_rq structure...

Stack zeroing enabled:
- XDP_DROP:
* baseline: 24.32 Mpps
* baseline + per-RQ allocation: 32.27 Mpps (+32.7%)

This part makes sense.