Re: [PATCH v2 14/14] net: ethernet: mtk_eth_soc: support creating mac address based offload entries

From: Andrew Lunn
Date: Tue Apr 12 2022 - 13:37:47 EST


> It basically has to keep track of all possible destination ports, their STP
> state, all their fdb entries, member VLANs of all ports. It has to quickly
> react to changes in any of these.

switchdev gives you all of those i think. DSA does not make use of
them all, in particularly the fdb entries, because of the low
bandwidth management link to the switch. But look at the Mellanox
switch, it keeps its hardware fdb entries in sync with the software
fdb.

And you get every quick access to these, sometimes too quick in that
it is holding a spinlock when it calls the switchdev functions, and
you need to defer the handling in your driver if you want to use a
mutex, perform blocking IO etc.

> In order to implement this properly, I would also need to make more changes
> to mac80211. Right now, mac80211 drivers do not have access to the
> net_device pointer of virtual interfaces. So mac80211 itself would likely
> need to implement the switchdev ops and handle some of this.

So this again sounds like something which would be shared by IPA, and
any other hardware which can accelerate forwarding between WiFi and
some other sort of interface.

> There are also some other issues where I don't know how this is supposed to
> be solved properly:
> On MT7622 most of the bridge ports are connected to a MT7531 switch using
> DSA. Offloading (lan->wlan bridging or L3/L4 NAT/routing) is not handled by
> the switch itself, it is handled by a packet processing engine in the SoC,
> which knows how to handle the DSA tags of the MT7531 switch.
>
> So if I were to handle this through switchdev implemented on the wlan and
> ethernet devices, it would technically not be part of the same switch, since
> it's a behind a different component with a different driver.

What is important here is the user experience. The user is not
expected to know there is an accelerate being used. You setup the
bridge just as normal, using iproute2. You add routes in the normal
way, either by iproute2, or frr can add routes from OSPF, BGP, RIP or
whatever, via zebra. I'm not sure anybody has yet accelerated NAT, but
the same principle should be used, using iptables in the normal way,
and the accelerate is then informed and should accelerate it if
possible.

switchdev gives you notification of when anything changes. You can
have multiple receivers of these notifications, so the packet
processor can act on them as well as the DSA switch.

> Also, is switchdev able to handle the situation where only parts of the
> traffic is offloaded and the rest (e.g. multicast) is handled through the
> regular software path?

Yes, that is not a problem. I deliberately use the term
accelerator. We accelerate what Linux can already do. If the
accelerator hardware is not capable of something, Linux still is, so
just pass it the frames and it will do the right thing. Multicast is a
good example of this, many of the DSA switch drivers don't accelerate
it.

> In my opinion, handling it through the TC offload has a number of
> advantages:
> - It's a lot simpler
> - It uses the same kind of offloading rules that my software fastpath
> already uses
> - It allows more fine grained control over which traffic should be offloaded
> (src mac -> destination MAC tuple)
>
> I also plan on extending my software fast path code to support emulating
> bridging of WiFi client mode interfaces. This involves doing some MAC
> address translation with some IP address tracking. I want that to support
> hardware offload as well.
>
> I really don't think that desire for supporting switchdev based offload
> should be a blocker for accepting this code now, especially since my
> implementation relies on existing Linux network APIs without inventing any
> new ones, and there are valid use cases for using it, even with switchdev
> support in place.

What we need to avoid is fragmentation of the way we do things. It has
been decided that switchdev is how we use accelerators, and the user
should not really know anything about the accelerator. No other in
kernel network accelerator needs a user space component listening to
netlink notifications and programming the accelerator from user space.
Do we really want two ways to do this?

Andrew