Re: [PATCH v2 14/14] net: ethernet: mtk_eth_soc: support creating mac address based offload entries

From: Felix Fietkau
Date: Tue Apr 12 2022 - 13:52:12 EST



On 12.04.22 19:37, Andrew Lunn wrote:
It basically has to keep track of all possible destination ports, their STP
state, all their fdb entries, member VLANs of all ports. It has to quickly
react to changes in any of these.

switchdev gives you all of those i think. DSA does not make use of
them all, in particularly the fdb entries, because of the low
bandwidth management link to the switch. But look at the Mellanox
switch, it keeps its hardware fdb entries in sync with the software
fdb.

And you get every quick access to these, sometimes too quick in that
it is holding a spinlock when it calls the switchdev functions, and
you need to defer the handling in your driver if you want to use a
mutex, perform blocking IO etc.

In order to implement this properly, I would also need to make more changes
to mac80211. Right now, mac80211 drivers do not have access to the
net_device pointer of virtual interfaces. So mac80211 itself would likely
need to implement the switchdev ops and handle some of this.

So this again sounds like something which would be shared by IPA, and
any other hardware which can accelerate forwarding between WiFi and
some other sort of interface.
I would really like to see an example of how this should be done.
Is there a work in progress tree for IPA with offloading? Because the code that I see upstream doesn't seem to have any of that - or did I look in the wrong place?

There are also some other issues where I don't know how this is supposed to
be solved properly:
On MT7622 most of the bridge ports are connected to a MT7531 switch using
DSA. Offloading (lan->wlan bridging or L3/L4 NAT/routing) is not handled by
the switch itself, it is handled by a packet processing engine in the SoC,
which knows how to handle the DSA tags of the MT7531 switch.

So if I were to handle this through switchdev implemented on the wlan and
ethernet devices, it would technically not be part of the same switch, since
it's a behind a different component with a different driver.

What is important here is the user experience. The user is not
expected to know there is an accelerate being used. You setup the
bridge just as normal, using iproute2. You add routes in the normal
way, either by iproute2, or frr can add routes from OSPF, BGP, RIP or
whatever, via zebra. I'm not sure anybody has yet accelerated NAT, but
the same principle should be used, using iptables in the normal way,
and the accelerate is then informed and should accelerate it if
possible.
Accelerated NAT on MT7622 is already present in the upstream code for a while. It's there for ethernet, and with my patches it also works for ethernet -> wlan.

switchdev gives you notification of when anything changes. You can
have multiple receivers of these notifications, so the packet
processor can act on them as well as the DSA switch.
Also, is switchdev able to handle the situation where only parts of the
traffic is offloaded and the rest (e.g. multicast) is handled through the
regular software path?

Yes, that is not a problem. I deliberately use the term
accelerator. We accelerate what Linux can already do. If the
accelerator hardware is not capable of something, Linux still is, so
just pass it the frames and it will do the right thing. Multicast is a
good example of this, many of the DSA switch drivers don't accelerate
it.
Don't get me wrong, I'm not against switchdev support at all. I just don't know how to do it yet, and the code that I put in place is useful for non-switchdev use cases as well.

In my opinion, handling it through the TC offload has a number of
advantages:
- It's a lot simpler
- It uses the same kind of offloading rules that my software fastpath
already uses
- It allows more fine grained control over which traffic should be offloaded
(src mac -> destination MAC tuple)

I also plan on extending my software fast path code to support emulating
bridging of WiFi client mode interfaces. This involves doing some MAC
address translation with some IP address tracking. I want that to support
hardware offload as well.

I really don't think that desire for supporting switchdev based offload
should be a blocker for accepting this code now, especially since my
implementation relies on existing Linux network APIs without inventing any
new ones, and there are valid use cases for using it, even with switchdev
support in place.

What we need to avoid is fragmentation of the way we do things. It has
been decided that switchdev is how we use accelerators, and the user
should not really know anything about the accelerator. No other in
kernel network accelerator needs a user space component listening to
netlink notifications and programming the accelerator from user space.
Do we really want two ways to do this?
There's always some overlap in what the APIs can do. And when it comes to the "client mode bridge" use case that I mentioned, I would also need exactly the same API that I put in place here. And this is not something that can (or even should) be done using switchdev. mac80211 prevents adding client mode interfaces to bridges for a reason.

- Felix