Re: [PATCH v3 net-next 18/24] net: dsa: sja1105: Add support for traffic through standalone ports

From: Vladimir Oltean
Date: Sat Apr 13 2019 - 18:09:24 EST


On Sun, 14 Apr 2019 at 00:27, Vladimir Oltean <olteanv@xxxxxxxxx> wrote:
>
> On Sat, 13 Apr 2019 at 19:38, Andrew Lunn <andrew@xxxxxxx> wrote:
> >
> > On Sat, Apr 13, 2019 at 04:28:16AM +0300, Vladimir Oltean wrote:
> > > In order to support this, we are creating a make-shift switch tag out of
> > > a VLAN trunk configured on the CPU port. Termination of normal traffic
> > > on switch ports only works when not under a vlan_filtering bridge.
> > > Termination of management (PTP, BPDU) traffic works under all
> > > circumstances because it uses a different tagging mechanism
> > > (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q
> > > code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105.
> > >
> > > There are two types of traffic: regular and link-local.
> > > The link-local traffic received on the CPU port is trapped from the
> > > switch's regular forwarding decisions because it matched one of the two
> > > DMAC filters for management traffic.
> > > On transmission, the switch requires special massaging for these
> > > link-local frames. Due to a weird implementation of the switching IP, by
> > > default it drops link-local frames that originate on the CPU port. It
> > > needs to be told where to forward them to, through an SPI command
> > > ("management route") that is valid for only a single frame.
> > > So when we're sending link-local traffic, we need to clone skb's from
> > > DSA and send them in our custom xmit worker that also performs SPI access.
> > >
> > > For that purpose, the DSA xmit handler and the xmit worker communicate
> > > through a per-port "skb ring" software structure, with a producer and a
> > > consumer index. At the moment this structure is rather fragile
> > > (ping-flooding to a link-local DMAC would cause most of the frames to
> > > get dropped). I would like to move the management traffic on a separate
> > > netdev queue that I can stop when the skb ring got full and hardware is
> > > busy processing, so that we are not forced to drop traffic.
> > >
> > > Signed-off-by: Vladimir Oltean <olteanv@xxxxxxxxx>
> > > Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
> > > ---
> > > Changes in v3:
> > > Made management traffic be receivable on the DSA netdevices even when
> > > switch tagging is disabled, as well as regular traffic be receivable on
> > > the master netdevice in the same scenario. Both are accomplished using
> > > the sja1105_filter() function and some small touch-ups in the .rcv
> > > callback.
> >
> > It seems like you made major changes to this. When you do that, you
> > should drop any reviewed-by tags you have. They are no longer valid
> > because of the major changes.
> >
>
> Ok, noted.
>
> > > /* This callback needs to be present */
> > > @@ -1141,7 +1158,11 @@ static int sja1105_vlan_filtering(struct dsa_switch *ds, int port, bool enabled)
> > > if (rc)
> > > dev_err(ds->dev, "Failed to change VLAN Ethertype\n");
> > >
> > > - return rc;
> > > + /* Switch port identification based on 802.1Q is only passable
> >
> > possible, not passable.
> >
>
> Passable (satisfactory, decent, acceptable) is what I wanted to say.
> Tagging using VLANs is possible even when the bridge wants to use
> them, but it's smarter not to go there. But I get your point, maybe
> I'll rephrase.
>
> > > + * if we are not under a vlan_filtering bridge. So make sure
> > > + * the two configurations are mutually exclusive.
> > > + */
> > > + return sja1105_setup_8021q_tagging(ds, !enabled);
> > > }
> > >
> > > static void sja1105_vlan_add(struct dsa_switch *ds, int port,
> > > @@ -1233,9 +1254,107 @@ static int sja1105_setup(struct dsa_switch *ds)
> > > */
> > > ds->vlan_filtering_is_global = true;
> > >
> > > + /* The DSA/switchdev model brings up switch ports in standalone mode by
> > > + * default, and that means vlan_filtering is 0 since they're not under
> > > + * a bridge, so it's safe to set up switch tagging at this time.
> > > + */
> > > + return sja1105_setup_8021q_tagging(ds, true);
> > > +}
> > > +
> > > +#include "../../../net/dsa/dsa_priv.h"
> >
> > No. Don't use relative includes like this.
> >
> > What do you need from the header? Maybe move it into
> > include/linux/net/dsa.h
> >
>
> dsa_slave_to_master()
>
> > > +/* Deferred work is unfortunately necessary because setting up the management
> > > + * route cannot be done from atomit context (SPI transfer takes a sleepable
> > > + * lock on the bus)
> > > + */
> > > +static void sja1105_xmit_work_handler(struct work_struct *work)
> > > +{
> > > + struct sja1105_port *sp = container_of(work, struct sja1105_port,
> > > + xmit_work);
> > > + struct sja1105_private *priv = sp->dp->ds->priv;
> > > + struct net_device *slave = sp->dp->slave;
> > > + struct net_device *master = dsa_slave_to_master(slave);
> > > + int port = (uintptr_t)(sp - priv->ports);
> > > + struct sk_buff *skb;
> > > + int i, rc;
> > > +
> > > + while ((i = sja1105_skb_ring_get(&sp->xmit_ring, &skb)) >= 0) {
> > > + struct sja1105_mgmt_entry mgmt_route = { 0 };
> > > + struct ethhdr *hdr;
> > > + int timeout = 10;
> > > + int skb_len;
> > > +
> > > + skb_len = skb->len;
> > > + hdr = eth_hdr(skb);
> > > +
> > > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest);
> > > + mgmt_route.destports = BIT(port);
> > > + mgmt_route.enfport = 1;
> > > + mgmt_route.tsreg = 0;
> > > + mgmt_route.takets = false;
> > > +
> > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE,
> > > + port, &mgmt_route, true);
> > > + if (rc < 0) {
> > > + kfree_skb(skb);
> > > + slave->stats.tx_dropped++;
> > > + continue;
> > > + }
> > > +
> > > + /* Transfer skb to the host port. */
> > > + skb->dev = master;
> > > + dev_queue_xmit(skb);
> > > +
> > > + /* Wait until the switch has processed the frame */
> > > + do {
> > > + rc = sja1105_dynamic_config_read(priv, BLK_IDX_MGMT_ROUTE,
> > > + port, &mgmt_route);
> > > + if (rc < 0) {
> > > + slave->stats.tx_errors++;
> > > + dev_err(priv->ds->dev,
> > > + "xmit: failed to poll for mgmt route\n");
> > > + continue;
> > > + }
> > > +
> > > + /* UM10944: The ENFPORT flag of the respective entry is
> > > + * cleared when a match is found. The host can use this
> > > + * flag as an acknowledgment.
> > > + */
> > > + cpu_relax();
> > > + } while (mgmt_route.enfport && --timeout);
> > > +
> > > + if (!timeout) {
> > > + dev_err(priv->ds->dev, "xmit timed out\n");
> > > + slave->stats.tx_errors++;
> > > + continue;
> > > + }
> > > +
> > > + slave->stats.tx_packets++;
> > > + slave->stats.tx_bytes += skb_len;
> > > + }
> > > +}
> > > +
> > > +static int sja1105_port_enable(struct dsa_switch *ds, int port,
> > > + struct phy_device *phydev)
> > > +{
> > > + struct sja1105_private *priv = ds->priv;
> > > + struct sja1105_port *sp = &priv->ports[port];
> > > +
> > > + sp->dp = &ds->ports[port];
> > > + INIT_WORK(&sp->xmit_work, sja1105_xmit_work_handler);
> > > return 0;
> > > }
> >
> > I think i'm missing something here. You have a per port queue of link
> > local frames which need special handling. And you have a per-port work
> > queue. To send such a frame, you need to write some register, send the
> > frame, and then wait until the mgmt_route.enfport is reset.
> >
> > Why are you doing this per port? How do you stop two ports/work queues
> > running at the same time? It seems like one queue, with one work queue
> > would be a better structure.
> >
>
> See the "port" parameter to this call here:
>
> rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE,
> *port*, &mgmt_route, true);
>
> The switch IP aptly allocates 4 slots for management routes. And it's
> a 5-port switch where 1 port is the management port. I think the
> structure is fine.
>

"How do stop two work queues": if you're talking about contention on
the hardware management route, I responded to that.
If you're talking about netif_stop_queue(), I think I'm going to avoid
that altogether by not having a finite sized ring structure. While
studying the Marvell 88e6060 driver I found out that sk_buff_head
exists. Now I'm part of the elite club of wheel reinventors with my
struct sja1105_skb_ring :)


> > Also, please move all this code into the tagger. Just add exports for
> > sja1105_dynamic_config_write() and sja1105_dynamic_config_read().
> >
>
> Well, you see, the tagger code is part of the dsa_core object. If I
> export function symbols from the driver, those still won't be there if
> I compile the driver as a module. On the other hand, the way I'm doing
> it, I think the schedule_work() gives me a pretty good separation.
>
> > > +static void sja1105_port_disable(struct dsa_switch *ds, int port)
> > > +{
> > > + struct sja1105_private *priv = ds->priv;
> > > + struct sja1105_port *sp = &priv->ports[port];
> > > + struct sk_buff *skb;
> > > +
> > > + cancel_work_sync(&sp->xmit_work);
> > > + while (sja1105_skb_ring_get(&sp->xmit_ring, &skb) >= 0)
> > > + kfree_skb(skb);
> > > +}
> > > +
> > > diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> > > new file mode 100644
> > > index 000000000000..5c76a06c9093
> > > --- /dev/null
> > > +++ b/net/dsa/tag_sja1105.c
> > > @@ -0,0 +1,148 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/* Copyright (c) 2019, Vladimir Oltean <olteanv@xxxxxxxxx>
> > > + */
> > > +#include <linux/etherdevice.h>
> > > +#include <linux/if_vlan.h>
> > > +#include <linux/dsa/sja1105.h>
> > > +#include "../../drivers/net/dsa/sja1105/sja1105.h"
> >
> > Again, no, don't do this.
> >
>
> This separation between driver and tagger is fairly arbitrary.
> I need access to the driver's private structure, in order to get a
> hold of the private shadow of the dsa_port. Moving the driver private
> structure to include/linux/dsa/ would pull in quite a number of
> dependencies. Maybe I could provide declarations for the most of them,
> but anyway the private structure wouldn't be so private any longer,
> would it?
> Otherwise put, would you prefer a dp->priv similar to the already
> existing ds->priv? struct sja1105_port is much more lightweight to
> keep in include/linux/dsa/.
>
> > > +
> > > +#include "dsa_priv.h"
> > > +
> > > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */
> > > +static inline bool sja1105_is_link_local(const struct sk_buff *skb)
> > > +{
> > > + const struct ethhdr *hdr = eth_hdr(skb);
> > > + u64 dmac = ether_addr_to_u64(hdr->h_dest);
> > > +
> > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) ==
> > > + SJA1105_LINKLOCAL_FILTER_A)
> > > + return true;
> > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) ==
> > > + SJA1105_LINKLOCAL_FILTER_B)
> > > + return true;
> > > + return false;
> > > +}
> > > +
> > > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev)
> > > +{
> > > + if (sja1105_is_link_local(skb))
> > > + return true;
> > > + if (!dev->dsa_ptr->vlan_filtering)
> > > + return true;
> > > + return false;
> > > +}
> >
> > Please add a comment here about what frames cannot be handled by the
> > tagger. However, i'm not too happy about this design...
> >
>
> Ok, let's put this another way.
> A switch is primarily a device used to offload the forwarding of
> traffic based on L2 rules. Additionally there may be some management
> traffic for stuff like STP that needs to be terminated on the host
> port of the switch. For that, the hardware's job is to filter and tag
> management frames on their way to the host port, and the software's
> job is to process the source port and switch id information in a
> meaningful way.
> Now both this particular switch hardware, and DSA, are taking the
> above definitions to extremes.
> The switch says: "that's all you want to see? ok, so that's all I'm
> going to give you". So its native (hardware) tagging protocol is to
> trap link-local traffic and overwrite two bytes of its destination MAC
> with the switch ID and the source port. No more, no less. It is an
> incomplete solution, but it does the job for practical use cases.
> Now DSA says: "I want these to be fully capable net devices, I want
> the user to not even realize what's going on under the hood". I don't
> think that terminating iperf traffic through switch ports is a
> realistic usage scenario. So in a way discussions about performance
> and optimizations on DSA hotpath are slightly pointless IMO.
> Now what my driver says is that it offers a bit of both. It speaks the
> hardware's tagging protocol so it is capable of management traffic,
> but it also speaks the DSA paradigm, so in a way pushes the hardware
> to work in a mode it was never intended to, by repurposing VLANs when
> the user doesn't request them. So on one hand there is some overlap
> between the hardware tagging protocol and the VLAN one (in standalone
> mode and in VLAN-unaware bridged mode, management traffic *could* use
> VLAN tagging but it doesn't rely on it), and on the other hand the
> reunion of the two tagging protocols is decent, but still doesn't
> cover the entire spectrum (when put under a VLAN-aware bridge, you
> lose the ability to decode general traffic). So you'd better not rely
> on VLANs to decode the management traffic, because you won't be able
> to always rely on that, and that is a shame since a bridge with both
> vlan_filtering 1 and stp_state 1 is a real usage scenario, and the
> hardware is capable of that combination.
> But all of that is secondary. Let's forget about VLAN tagging for a
> second and concentrate on the tagging of management traffic. The
> limiting factor here is the software architecture of DSA, because in
> order for me to decode that in the driver/tagger, I'd have to drop
> everything else coming on the master net device (I explained in 13/24
> why). I believe that DSA being all-or-nothing about switch tagging is
> turning a blind eye to the devices that don't go overboard with
> features, and give you what's needed in a real-world design but not
> much else.
> What would you improve about this design (assuming you're talking
> about the filter function)?
>
> Thanks,
> -Vladimir
>
>
>
>
>
> > > +
> > > +static struct sk_buff *sja1105_xmit(struct sk_buff *skb,
> > > + struct net_device *netdev)
> > > +{
> > > + struct dsa_port *dp = dsa_slave_to_port(netdev);
> > > + struct dsa_switch *ds = dp->ds;
> > > + struct sja1105_private *priv = ds->priv;
> > > + struct sja1105_port *sp = &priv->ports[dp->index];
> > > + struct sk_buff *clone;
> > > +
> > > + if (likely(!sja1105_is_link_local(skb))) {
> > > + /* Normal traffic path. */
> > > + u16 tx_vid = dsa_tagging_tx_vid(ds, dp->index);
> > > + u8 pcp = skb->priority;
> > > +
> > > + /* If we are under a vlan_filtering bridge, IP termination on
> > > + * switch ports based on 802.1Q tags is simply too brittle to
> > > + * be passable. So just defer to the dsa_slave_notag_xmit
> > > + * implementation.
> > > + */
> > > + if (dp->vlan_filtering)
> > > + return skb;
> > > +
> > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA,
> > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid));
> >
> > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105.
> >
> > > + }
> > > +
> > > + /* Code path for transmitting management traffic. This does not rely
> > > + * upon switch tagging, but instead SPI-installed management routes.
> > > + */
> > > + clone = skb_clone(skb, GFP_ATOMIC);
> > > + if (!clone) {
> > > + dev_err(ds->dev, "xmit: failed to clone skb\n");
> > > + return NULL;
> > > + }
> > > +
> > > + if (sja1105_skb_ring_add(&sp->xmit_ring, clone) < 0) {
> > > + dev_err(ds->dev, "xmit: skb ring full\n");
> > > + kfree_skb(clone);
> > > + return NULL;
> > > + }
> > > +
> > > + if (sp->xmit_ring.count == SJA1105_SKB_RING_SIZE)
> > > + /* TODO setup a dedicated netdev queue for management traffic
> > > + * so that we can selectively apply backpressure and not be
> > > + * required to stop the entire traffic when the software skb
> > > + * ring is full. This requires hooking the ndo_select_queue
> > > + * from DSA and matching on mac_fltres.
> > > + */
> > > + dev_err(ds->dev, "xmit: reached maximum skb ring size\n");
> >
> > This should be rate limited.
> >
> > Andrew
> >
> > > +
> > > + schedule_work(&sp->xmit_work);
> > > + /* Let DSA free its reference to the skb and we will free
> > > + * the clone in the deferred worker
> > > + */
> > > + return NULL;
> > > +}
> > > +
> > > +static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
> > > + struct net_device *netdev,
> > > + struct packet_type *pt)
> > > +{
> > > + unsigned int source_port, switch_id;
> > > + struct ethhdr *hdr = eth_hdr(skb);
> > > + struct sk_buff *nskb;
> > > + u16 tpid, vid, tci;
> > > + bool is_tagged;
> > > +
> > > + nskb = dsa_8021q_rcv(skb, netdev, pt, &tpid, &tci);
> > > + is_tagged = (nskb && tpid == ETH_P_EDSA);
> > > +
> > > + skb->priority = (tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT;
> > > + vid = tci & VLAN_VID_MASK;
> > > +
> > > + skb->offload_fwd_mark = 1;
> > > +
> > > + if (likely(!sja1105_is_link_local(skb))) {
> > > + /* Normal traffic path. */
> > > + source_port = dsa_tagging_rx_source_port(vid);
> > > + switch_id = dsa_tagging_rx_switch_id(vid);
> > > + } else {
> > > + /* Management traffic path. Switch embeds the switch ID and
> > > + * port ID into bytes of the destination MAC, courtesy of
> > > + * the incl_srcpt options.
> > > + */
> > > + source_port = hdr->h_dest[3];
> > > + switch_id = hdr->h_dest[4];
> > > + /* Clear the DMAC bytes that were mangled by the switch */
> > > + hdr->h_dest[3] = 0;
> > > + hdr->h_dest[4] = 0;
> > > + }
> > > +
> > > + skb->dev = dsa_master_find_slave(netdev, switch_id, source_port);
> > > + if (!skb->dev) {
> > > + netdev_warn(netdev, "Couldn't decode source port\n");
> > > + return NULL;
> > > + }
> > > +
> > > + /* Delete/overwrite fake VLAN header, DSA expects to not find
> > > + * it there, see dsa_switch_rcv: skb_push(skb, ETH_HLEN).
> > > + */
> > > + if (is_tagged)
> > > + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - VLAN_HLEN,
> > > + ETH_HLEN - VLAN_HLEN);
> > > +
> > > + return skb;
> > > +}
> > > +
> > > +const struct dsa_device_ops sja1105_netdev_ops = {
> > > + .xmit = sja1105_xmit,
> > > + .rcv = sja1105_rcv,
> > > + .filter = sja1105_filter,
> > > + .overhead = VLAN_HLEN,
> > > +};
> > > +
> > > --
> > > 2.17.1
> > >