Re: [PATCH net-next v3 10/12] net/mlx5e: Implement queue mgmt ops and single channel swap

From: Jakub Kicinski
Date: Thu Jun 12 2025 - 18:53:03 EST


On Thu, 12 Jun 2025 13:44:44 -0700 Mina Almasry wrote:
> > On Wed, 2025-06-11 at 22:33 -0700, Mina Almasry wrote:
> > > Is this really better than maintaining uniformity of behavior between
> > > the drivers that support the queue mgmt api and just doing the
> > > mlx5e_deactivate_priv_channels and mlx5e_close_channel in the stop
> > > like core sorta expects?
> > >
> > > We currently use the ndos to restart a queue, but I'm imagining in
> > > the
> > > future we can expand it to create queues on behalf of the queues. The
> > > stop queue API may be reused in other contexts, like maybe to kill a
> > > dynamically created devmem queue or something, and this specific
> > > driver may stop working because stop actually doesn't do anything?
> > >
> >
> > The .ndo_queue_stop operation doesn't make sense by itself for mlx5,
> > because the current mlx5 architecture is to atomically swap in all of
> > the channels.
> > The scenario you are describing, with a hypothetical ndo_queue_stop for
> > dynamically created devmem queues would leave all of the queues stopped
> > and the old channel deallocated in the channel array. Worse problems
> > would happen in that state than with today's approach, which leaves the
> > driver in functional state.
> >
> > Perhaps Saeed can add more details to this?
>
> I see, so essentially mlx5 supports restarting a queue but not
> necessarily stopping and starting a queue as separate actions?
>
> If so, can maybe the comment on the function be reworded to more
> strongly indicate that this is a limitation? Just asking because
> future driver authors interested in implementing the queue API will
> probably look at one of mlx5/gve/bnxt to see what an existing
> implementation looks like, and I would rather them follow bnxt/gve
> that is more in line with core's expectations if possible. But that's
> a minor concern; I'm fine with this patch.
>
> FWIW this may break in the future if core decides to add code that
> actually uses the stop operation as a 'stop', not as a stepping stone
> to 'restart', but I'm not sure we can do anything about that if it's a
> driver limitation.

Agreed, would be good to add a TODO and follow up on this.
It will bite us sooner or later. I suppose state_lock may
need to be dropped in favor of the netdev instance lock first?

I'm disappointed that mlx5 once again disrupts all rings to restart
a single one. But all existing drivers seem to do this, so I guess
it'd be unfair to push back based on just that :|