Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO

From: Maxim Mikityanskiy
Date: Tue Jan 24 2023 - 08:02:42 EST


On Tue, Jan 24, 2023 at 11:49:48AM +0000, Hariprasad Kelam wrote:
>
>
> > On Mon, Jan 23, 2023 at 02:45:48PM -0800, Jakub Kicinski wrote:
> > > On Mon, 23 Jan 2023 22:05:58 +0200 Maxim Mikityanskiy wrote:
> > > > OK, I seem to get it now, thanks for the explanation!
> > > >
> > > > How do you set the priority for HTB, though? You mentioned this
> > > > command to set priority of unclassified traffic:
> > > >
> > > > devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 6 \
> > > > cmode runtime
> > > >
> > > > But what is the command to change priority for HTB?
> > > >
> > > > What bothers me about using devlink to configure HTB priority is:
> > > >
> > > > 1. Software HTB implementation doesn't have this functionality, and
> > > > it always prioritizes unclassified traffic. As far as I understand,
> > > > the rule for tc stuff is that all features must have a reference
> > > > implementation in software.
> > > >
> > > > 2. Adding a flag (prefer unclassified vs prefer classified) to HTB
> > > > itself may be not straightforward, because your devlink command has
> > > > a second purpose of setting priorities between PFs/VFs, and it may
> > > > conflict with the HTB flag.
> > >
> > > If there is a two-stage hierarchy the lower level should be controlled
> > > by devlink-rate, no?
> >
> > From the last picture by Hariprasad, I understood that the user sets all
> > priorities (for unclassified traffic per PF and VF, and for HTB traffic) on the
> > same TL2 level, i.e. it's not two-stage. (Maybe I got it all wrong again?)
> >
> > I asked about the command to change the HTB priority, cause the
> > parameters aren't easily guessed, but I assume it's also devlink (i.e.
> > driver-specific).
> >
> Currently, we don't support changing HTB priority since TC_HTB_MODIFY is not yet supported.
> The driver implementation is inline with htb tc framework, below are commands we use for setting htb priority
>
> ethtool -K eth0 hw-tc-offload on
> tc qdisc replace dev eth0 root handle 1: htb offload
> tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 2
> tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 3

I thought there was a concept of a priority of the whole HTB tree in
your implementation...

So, if I run these commands:

devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 2 cmode runtime
tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 1
tc class add dev eth0 parent 1: classid 1:2 htb rate 10Gbit prio 3

Will it prioritize class 1:1 over unclassified traffic, and unclassified
traffic over class 1:2?

> > If there were two levels (the upper level chooses who goes first: HTB or
> > unclassified, and the lower level sets priorities per PF and VF for unclassified
> > traffic), that would be more straightforward to solve: the upper level should
> > be controlled by a new HTB parameter, and the lower level is device-specific,
> > thus it goes to devlink.
>
> The PF netdev and VFs share the same physical link and share the same TL1 node.
> Hardware supports one DWRR group and the rest are strict priority nodes. Driver configures
> the priority set by devlink to PF and VF traffic TL2 nodes such that traffic is forwarded
> to TL1 using DWRR algorithm.
>
> Now that if we add htb command for unclassified traffic, at any given point in time HTB
> rule only applies to a single interface, since we require to set DWRR priority at TL1,
> which applies to both PF/VFs, we feel it's a different use case altogether.

Thanks, with the example above your explanation makes sense to me now.

So, basically, in your implementation, entities prioritized by hardware
are: each HTB class, each VF and PF; you want to keep user-assigned
priorities for HTB classes, you want to let the user assign a priority
for unclassified traffic, but the latter must be equal for all VFs and
PF (for DWRR to work), correct? And that devlink command is only useful
in the HTB scenario, i.e. it doesn't matter what tl1_rr_prio you set if
HTB is not used, right?

What I don't like in the current implementation is that it adds a
feature to HTB, bypassing HTB (also not providing a software equivalent
of the feature). I would expect the priority of unclassified traffic to
be configured with tc commands that set up HTB (by the way, HTB has a
"default" option to choose a class for unclassified traffic, and a
priority can be set for this class - this functionality can be leveraged
for this purpose, or a new option can be added, whatever looks more
appropriate). On the other hand, I understand your hardware limitation
requiring to have the same priority for all VFs and PF on the same port.

It's hard to suggest something good here, actually... An obvious thought
is to let the first netdev that configures HTB choose the priority for
unclassified traffic, and block attempts from other netdevs to set a
different one, but this approach also has obvious drawbacks: PF has no
special powers here, and it can't set the desired priority if PF itself
doesn't use HTB (or doesn't configure it first, before VFs).

Another idea of mine is to keep the devlink command for enforcement
purpose, and make the behavior as follows:

1. The user will pick a priority for unclassified traffic when attaching
HTB.

2. If tl1_rr_prio was set with devlink, block attempts to attach HTB
with a different default priority.

3. If tl1_rr_prio wasn't set or was reset, allow attaching HTB to PF
with any default priority.

This way, VFs can't attach HTB with arbitrary priorities, only with the
one that the admin has enforced using devlink. At the same time, if VFs
aren't used, no extra steps are needed to just use HTB on a PF. On the
other hand, it adds some complexity and may sound controversial to
someone. Thoughts?