RE: [PATCH V2 net-next] net: fec: add CBS offload support

From: Wei Fang
Date: Thu Feb 16 2023 - 08:03:49 EST



> -----Original Message-----
> From: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx>
> Sent: 2023年2月15日 0:49
> To: Wei Fang <wei.fang@xxxxxxx>
> Cc: Shenwei Wang <shenwei.wang@xxxxxxx>; Clark Wang
> <xiaoning.wang@xxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
> kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; simon.horman@xxxxxxxxxxxx;
> andrew@xxxxxxx; netdev@xxxxxxxxxxxxxxx; dl-linux-imx <linux-imx@xxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH V2 net-next] net: fec: add CBS offload support
>
> From: Wei Fang <wei.fang@xxxxxxx>
> Date: Tue, 14 Feb 2023 09:34:09 +0000
>
> >> -----Original Message-----
> >> From: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx>
> >> Sent: 2023年2月14日 0:05
> >> To: Wei Fang <wei.fang@xxxxxxx>
> >> Cc: Shenwei Wang <shenwei.wang@xxxxxxx>; Clark Wang
> >> <xiaoning.wang@xxxxxxx>; davem@xxxxxxxxxxxxx;
> edumazet@xxxxxxxxxx;
> >> kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; simon.horman@xxxxxxxxxxxx;
> >> andrew@xxxxxxx; netdev@xxxxxxxxxxxxxxx; dl-linux-imx
> >> <linux-imx@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH V2 net-next] net: fec: add CBS offload support
> >>
> >> From: Wei Fang <wei.fang@xxxxxxx>
> >> Date: Mon, 13 Feb 2023 17:29:12 +0800
> >>
> >>> From: Wei Fang <wei.fang@xxxxxxx>
> >>>
> >>
> >>> @@ -571,6 +575,12 @@ struct fec_stop_mode_gpr {
> >>> u8 bit;
> >>> };
> >>>
> >>> +struct fec_cbs_params {
> >>> + bool enable[FEC_ENET_MAX_TX_QS];
> >>
> >> Maybe smth like
> >>
> >> DECLARE_BITMAP(enabled, FEC_ENET_MAX_TX_QS);
> >>
> >> ?
> > I think it's a matter of personal habit.
>
> It's a matter that you can fit 32 bits if you declare it as u32 or 64 bits if as a
> bitmap vs you waste 1 byte per 1 true/false flag as you do in this version :D
>
Okay, I'll optimize this part.

> >
> >>
> >>> + int idleslope[FEC_ENET_MAX_TX_QS];
> >>> + int sendslope[FEC_ENET_MAX_TX_QS];
> >>
> >> Can they actually be negative? (probably I'll see it below)
> >>
> > idleslope and sendslope are used to save the parameters passed in from user
> space.
> > And the sendslope should always be negative.
>
> Parameters coming from userspace must be validated before saving them
> anywhere.
> Also, if sendslope is always negative as you say, then just negate it when you
> read it from userspace and store as u32.
Sorry, I still don't understand why u32 is necessary to store parameters from user
space? In addition, people who understand 802.1Qav may be confused when they
see that sendslope is a u32 type.

>
> >
> >>> +};
> >>> +
> >>> /* The FEC buffer descriptors track the ring buffers. The rx_bd_base and
> >>> * tx_bd_base always point to the base of the buffer descriptors. The
> >>> * cur_rx and cur_tx point to the currently available buffer.
> >>> @@ -679,6 +689,9 @@ struct fec_enet_private {
> >>> /* XDP BPF Program */
> >>> struct bpf_prog *xdp_prog;
> >>>
> >>> + /* CBS parameters */
> >>> + struct fec_cbs_params cbs;
> >>> +
> >>> u64 ethtool_stats[];
> >>> };
> >>>
> >>> @@ -66,6 +66,7 @@
> >>> #include <linux/mfd/syscon.h>
> >>> #include <linux/regmap.h>
> >>> #include <soc/imx/cpuidle.h>
> >>> +#include <net/pkt_sched.h>
> >>
> >> Some alphabetic order? At least for new files :D
> >>
> > I just want to keep the reverse Christmas tree style, although the
> > whole #include order is already out of the style.
>
> RCT is applied to variable declaration inside functions, not to header include
> block :D
It seems that I misunderstood before, I'll fix it.

>
> >
> >>> #include <linux/filter.h>
> >>> #include <linux/bpf.h>
> >>>
> >>> @@ -1023,6 +1024,174 @@ static void fec_enet_reset_skb(struct
> >> net_device *ndev)
> >>> }
> >>> }
> >>>
> >>> +static u32 fec_enet_get_idle_slope(u8 bw)
> >>
> >> Just use `u32`, usually there's no reason to use types shorter than
> >> integer on the stack.
> >>
> >>> +{
> >>> + int msb, power;
> >>> + u32 idle_slope;
> >>> +
> >>> + if (bw >= 100)
> >>> + return 0;
> >>> +
> >>> + /* Convert bw to hardware idle slope */
> >>> + idle_slope = (512 * bw) / (100 - bw);
> >>> +
> >>
> >> Redundant newline. Also first pair of braces is optional and up to you.
> >>
> > I will fix this nit, thanks!
> >
> >>> + if (idle_slope >= 128) {
> >>> + /* For values greater than or equal to 128, idle_slope
> >>> + * rounded to the nearest multiple of 128.
> >>> + */
> >>
> >> But you can just do
> >>
> >> idle_slope = min(idle_slope, 128U);
> >>
> >> and still use the "standard" path below, without the conditional return?
> >> Or even combine it with the code above:
> >>
> >> idle_slope = min((512 * bw) / (100 - bw), 128U);
> >>
> > If idles_slope is greater than or equal to 128, idle_slope should be
> > rounded to the nearest multiple of 128. For example, if idle_slope =
> > 255, it should be set to 256. However min(idle_slope, 128U) is not as
> expected.
>
> So you say that for for >= 128 it must be a multiple of 128 and for <
> 128 it must be pow-2? Then I did misread your code a bit, sorry.
> But then my propo regarding adding round_closest() applies here as well.
>
yes.

> >> I think `fls() - 1` is preferred over `__fls()` since it may go UB.
> >> And some checks wouldn't hurt.
> >>
> > I used fls() at first, but later changed to __fls. Now that you
> > pointed out its possible risks, I'll change it back. Thanks.
> >
> >>> + power = BIT(msb);
> >>
> >> Oh okay. Then rounddown_pow_of_two() is what you're looking for.
> >>
> >> power = rounddown_pow_of_two(idle_slope);
> >>
> >> Or even just use one variable, @idle_slope.
> >>
> > Thanks for the reminder, I think I should use roundup_pow_of_two().
>
> But your code does what rounddown_pow_of_two() does, not roundup.
> Imagine that you have 0b1111, then your code will turn it into 0b1000, not
> 0b10000. Or am I missing something?
>
0b1111 is nearest to 0b10000, so it should be turned into 0x10000.

> >
> >>> + idle_slope = DIV_ROUND_CLOSEST(idle_slope, power) * power;
> >>> +
> >>> + return idle_slope;
> >>
> >> You can return DIV_ROUND_ ... right away, without assignning first.
> >> Also, I'm thinking of that this might be a generic helper. We have
> >> roundup() and rounddown(), this could be something like "round_closest()"?
> >>
> >>> +}
> >>> +
> >>> +static void fec_enet_set_cbs_idle_slope(struct fec_enet_private
> >>> +*fep) {
> >>> + u32 bw, val, idle_slope;
> >>> + int speed = fep->speed;
> >>> + int idle_slope_sum = 0;
> >>> + int i;
> >>
> >> Can any of them be negative?
> >>
> > No.
>
> So u32 for them.
>
> >
> >>> +
> >>> + if (!speed)
> >>> + return;
> >>> +
> >>> + for (i = 1; i < FEC_ENET_MAX_TX_QS; i++) {
> >>
> >> So you don't use the zeroth array elements at all? Why having them then?
> >>
> > Yes, the zeroth indicates queue 0, and queue 0 only support Non-AVB traffic.
> > Why having them then?
> > Firstly I think it more clear that the i indicates the index of queue.
> > Secondly, it is for more convenience. If you think it is
> > inappropriate, I will amend it later.
>
> Well, it's not recommended to allocate some space you will never use.
>
Okay, I'll fix it, thanks.
> >
> >>> + int port_tx_rate;
> >>
> >> (same for type)
> >>
> >>> +
> >>> + /* As defined in IEEE 802.1Q-2014 Section 8.6.8.2 item:
> >>> + * sendslope = idleslope - port_tx_rate
> >>> + * So we need to check whether port_tx_rate is equal to
> >>> + * the current link rate.
> >>
> >> [...]
> >>
> >>> + for (i = 1; i < FEC_ENET_MAX_TX_QS; i++) {
> >>> + bw = fep->cbs.idleslope[i] / (speed * 10);
> >>> + idle_slope = fec_enet_get_idle_slope(bw);
> >>> +
> >>> + val = readl(fep->hwp + FEC_DMA_CFG(i));
> >>> + val &= ~IDLE_SLOPE_MASK;
> >>> + val |= idle_slope & IDLE_SLOPE_MASK;
> >>
> >> u32_replace_bits() will do it for you.
> >>
> > Sorry, I can't find the definition of u32_replace_bits().
>
> Because Elixir doesn't index functions generated via macros. See the end of
> <linux/bitfield.h>, this is where the whole family gets defined.
>
Thanks, I got it.

> >
> >>> + writel(val, fep->hwp + FEC_DMA_CFG(i));
> >>> + }
> >>> +
> >>> + /* Enable Credit-based shaper. */
> >>> + val = readl(fep->hwp + FEC_QOS_SCHEME);
> >>> + val &= ~FEC_QOS_TX_SHEME_MASK;
> >>> + val |= CREDIT_BASED_SCHEME;
> >>
> >> (same)
> >>
> >>> + writel(val, fep->hwp + FEC_QOS_SCHEME); }
> >>> +
> >>> +static int fec_enet_setup_tc_cbs(struct net_device *ndev, void
> >>> +*type_data) {
> >>> + struct fec_enet_private *fep = netdev_priv(ndev);
> >>> + struct tc_cbs_qopt_offload *cbs = type_data;
> >>> + int queue = cbs->queue;
> >>> + int speed = fep->speed;
> >>> + int queue2;
> >>
> >> (types)
> >>
> >>> +
> >>> + if (!(fep->quirks & FEC_QUIRK_HAS_AVB))
> >>> + return -EOPNOTSUPP;
> >>> +
> >>> + /* Queue 1 for Class A, Queue 2 for Class B, so the ENET must
> >>> + * have three queues.
> >>> + */
> >>> + if (fep->num_tx_queues != FEC_ENET_MAX_TX_QS)
> >>> + return -EOPNOTSUPP;
> >>> +
> >>> + if (!speed) {
> >>> + netdev_err(ndev, "Link speed is 0!\n");
> >>
> >> ??? Is this possible? If so, why is it checked only here and why can
> >> it be possible?
> >>
> > It's possible if the board bootup without cable.
>
> Then it shouldn't be an error -- no link partner is a regular flow, not something
> horrendous.
yes, I'll fix it.
>
> >
> >>> + return -ECANCELED;
> >>
> >> (already mentioned in other review)
> >>
> >>> + }
> >>> +
> >>> + /* Queue 0 is not AVB capable */
> >>> + if (queue <= 0 || queue >= fep->num_tx_queues) {
> >>
> >> Is `< 0` possible? I realize it's `s32`, just curious.
> >>
> > If the wrong parameter is entered and the app does not check the
> > value, It might be 0. I'm not sure, just in case.
>
> Ah okay, so it's a check for userspace sanity. Then it should stay.
>
> >
> >>> + netdev_err(ndev, "The queue: %d is invalid!\n", queue);
> >>
> >> Maybe less emotions? There's no point in having `!` at the end of every
> error.
> >>
> > OK, it fine to remove '!'.
> >
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + if (!cbs->enable) {
> >>> + u32 val;
> >>> +
> >>> + val = readl(fep->hwp + FEC_QOS_SCHEME);
> >>> + val &= ~FEC_QOS_TX_SHEME_MASK;
> >>> + val |= ROUND_ROBIN_SCHEME;
> >>
> >> (u32_replace_bits())
> >>
> >>> + writel(val, fep->hwp + FEC_QOS_SCHEME);
> >>> +
> >>> + memset(&fep->cbs, 0, sizeof(fep->cbs));
> >>> +
> >>> + return 0;
> >>> + }
> >>> +
> >>> + if (cbs->idleslope - cbs->sendslope != speed * 1000 ||
> >>> + cbs->idleslope <= 0 || cbs->sendslope >= 0)
> >>
> >> So you check slopes here, why check them above then?
> >>
> > Because this function will be invoked due to some events in
> > fec_restart too, so I added these checks here. If you think it is
> > redundant, I will move these checks to fec_restart. Thanks.
>
> Maybe you could make a small static inline and use it where appropriate
> instead of open-coding?
>
That is a good suggestion, thanks.
> >
> >
> >>> + return -EINVAL;
> >>> +
> >>> + /* Another AVB queue */
> >>> + queue2 = (queue == 1) ? 2 : 1;
> >>
> >> Braces are unneeded.
> >>
> >>> + if (cbs->idleslope + fep->cbs.idleslope[queue2] > speed * 1000) {
> >>> + netdev_err(ndev,
> >>> + "The sum of all idle slope can't exceed link speed!\n");
> >>> + return -EINVAL;
> >>> + }
> >> [...]
> >>
> >> Thanks,
> >> Olek
> Thanks,
> Olek