Re: [PATCH] net: mellanox: mlx5: fix error return code of mlx5e_stats_flower()

From: Saeed Mahameed
Date: Thu Mar 11 2021 - 17:49:09 EST


On Tue, 2021-03-09 at 11:44 +0200, Roi Dayan wrote:
>
>
> On 2021-03-09 10:32 AM, Jia-Ju Bai wrote:
> >
> >
> > On 2021/3/9 16:24, Roi Dayan wrote:
> > >
> > >
> > > On 2021-03-09 10:20 AM, Roi Dayan wrote:
> > > >
> > > >
> > > > On 2021-03-06 3:47 PM, Jia-Ju Bai wrote:
> > > > > When mlx5e_tc_get_counter() returns NULL to counter or
> > > > > mlx5_devcom_get_peer_data() returns NULL to peer_esw, no
> > > > > error return
> > > > > code of mlx5e_stats_flower() is assigned.
> > > > > To fix this bug, err is assigned with -EINVAL in these cases.
> > > > >
> > > > > Reported-by: TOTE Robot <oslab@xxxxxxxxxxxxxxx>

Hey Jia-Ju, What are the conditions for this robot to raise a flag?
sometimes it is totally normal to abort a function and return 0.. i am
just curious to know ?


> > > > > Signed-off-by: Jia-Ju Bai <baijiaju1990@xxxxxxxxx>
> > > > > ---
> > > > >   drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12
> > > > > +++++++++---
> > > > >   1 file changed, 9 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > > > > b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > > > > index 0da69b98f38f..1f2c9da7bd35 100644
> > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > > > > @@ -4380,8 +4380,10 @@ int mlx5e_stats_flower(struct
> > > > > net_device
> > > > > *dev, struct mlx5e_priv *priv,
> > > > >       if (mlx5e_is_offloaded_flow(flow) ||
> > > > > flow_flag_test(flow, CT)) {
> > > > >           counter = mlx5e_tc_get_counter(flow);
> > > > > -        if (!counter)
> > > > > +        if (!counter) {
> > > > > +            err = -EINVAL;
> > > > >               goto errout;
> > > > > +        }
> > > > >           mlx5_fc_query_cached(counter, &bytes, &packets,
> > > > > &lastuse);
> > > > >       }
> > > > > @@ -4390,8 +4392,10 @@ int mlx5e_stats_flower(struct
> > > > > net_device
> > > > > *dev, struct mlx5e_priv *priv,
> > > > >        * un-offloaded while the other rule is offloaded.
> > > > >        */
> > > > >       peer_esw = mlx5_devcom_get_peer_data(devcom,
> > > > > MLX5_DEVCOM_ESW_OFFLOADS);
> > > > > -    if (!peer_esw)
> > > > > +    if (!peer_esw) {
> > > > > +        err = -EINVAL;
> > > >

This is not an error flow, i am curious what are the thresholds of this
robot ?

> > > > note here it's not an error. it could be there is no peer esw
> > > > so just continue with the stats update.
> > > >
> > > > >           goto out;
> > > > > +    }
> > > > >       if (flow_flag_test(flow, DUP) &&
> > > > >           flow_flag_test(flow->peer_flow, OFFLOADED)) {
> > > > > @@ -4400,8 +4404,10 @@ int mlx5e_stats_flower(struct
> > > > > net_device
> > > > > *dev, struct mlx5e_priv *priv,
> > > > >           u64 lastuse2;
> > > > >           counter = mlx5e_tc_get_counter(flow->peer_flow);
> > > > > -        if (!counter)
> > > > > +        if (!counter) {
> > > > > +            err = -EINVAL;
> > >
> > > this change is problematic. the current goto is to do stats
> > > update with
> > > the first counter stats we got but if you now want to return an
> > > error
> > > then you probably should not do any update at all.
> >
> > Thanks for your reply :)
> > I am not sure whether an error code should be returned here?
> > If so, flow_stats_update(...) should not be called here?
> >
> >
> > Best wishes,
> > Jia-Ju Bai
> >
>
> basically flow and peer_flow should be valid and protected from
> changes,
> and counter should be valid.
> it looks like the check here is more of a sanity check if something
> goes
> wrong but shouldn't. you can just let it be, update the stats from
> the
> first queried counter.
>

Roi, let's consider returning an error code here, we shouldn't be
silently returning if we are not expecting these errors,

why would mlx5e_stats_flower() be called if stats are not offloaded ?

Thanks,
Saeed.