Re: [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains

From: Przemek Kitszel
Date: Thu Aug 14 2025 - 05:59:29 EST


On 8/13/25 16:31, Tariq Toukan wrote:
From: Carolina Jubran <cjubran@xxxxxxxxxx>

If a VF has been configured and the user later clears all QoS settings,
the vport element remains in the firmware QoS tree. This leads to
inconsistent behavior compared to VFs that were never configured, since
the FW assumes that unconfigured VFs are outside the QoS hierarchy.
As a result, the bandwidth share across VFs may differ, even though
none of them appear to have any configuration.

Align the driver behavior with the FW expectation by destroying the
vport QoS element when all configurations are removed.

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@xxxxxxxxxx>
Reviewed-by: Cosmin Ratiu <cratiu@xxxxxxxxxx>
Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
---
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 54 +++++++++++++++++--
1 file changed, 51 insertions(+), 3 deletions(-)


+static bool esw_vport_qos_check_and_disable(struct mlx5_vport *vport,
+ struct devlink_rate *parent,
+ u64 tx_max, u64 tx_share,
+ u32 *tc_bw)
+{
+ struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
+
+ if (parent || tx_max || tx_share || !esw_qos_tc_bw_disabled(tc_bw))
+ return false;
+
+ esw_qos_lock(esw);
+ mlx5_esw_qos_vport_disable_locked(vport);
+ esw_qos_unlock(esw);
+
+ return true;
+}
+
int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
{
if (esw->qos.domain)
@@ -1703,6 +1731,11 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void
if (!mlx5_esw_allowed(esw))
return -EPERM;
+ if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
+ rate_leaf->tx_max, tx_share,
+ rate_leaf->tc_bw))
+ return 0;
+

I would rather keep executing the code that "sets tx_share to 0 and
propagates the info", and only then prune all-0 nodes.
Same for other params (tx_max, ...)

That would be less risky and more future-proof, and also would let your
check&disable function to take less params.

Finally, the name is poor, what about:?
esw_vport_qos_prune_empty(vport, rate_leaf);
(after applying my prev suggestion the above line will be at the
bottom of function).

Also a note, that if you apply the above, it would be also good to
keep the "esw_qos_lock() just once" (as it is now)

err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_share", &tx_share, extack);
if (err)
return err;
@@ -1724,6 +1757,11 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *
if (!mlx5_esw_allowed(esw))
return -EPERM;
+ if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent, tx_max,
+ rate_leaf->tx_share,
+ rate_leaf->tc_bw))
+ return 0;
+
err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_max", &tx_max, extack);
if (err)
return err;
@@ -1749,6 +1787,11 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf,
if (!mlx5_esw_allowed(esw))
return -EPERM;
+ if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
+ rate_leaf->tx_max,
+ rate_leaf->tx_share, tc_bw))
+ return 0;
+
disable = esw_qos_tc_bw_disabled(tc_bw);
esw_qos_lock(esw);
@@ -1930,6 +1973,11 @@ int mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate,
struct mlx5_esw_sched_node *node;
struct mlx5_vport *vport = priv;
+ if (esw_vport_qos_check_and_disable(vport, parent, devlink_rate->tx_max,
+ devlink_rate->tx_share,
+ devlink_rate->tc_bw))
+ return 0;
+
if (!parent)
return mlx5_esw_qos_vport_update_parent(vport, NULL, extack);