Re: [PATCH] net: dsa: tag_brcm: legacy: fix daisy-chained switches

From: Florian Fainelli
Date: Fri Mar 17 2023 - 18:03:38 EST


On 3/17/23 14:51, Álvaro Fernández Rojas wrote:
El vie, 17 mar 2023 a las 17:55, Florian Fainelli
(<f.fainelli@xxxxxxxxx>) escribió:

On 3/17/23 09:49, Jonas Gorski wrote:
On Fri, 17 Mar 2023 at 17:32, Andrew Lunn <andrew@xxxxxxx> wrote:

On Fri, Mar 17, 2023 at 01:08:15PM +0100, Álvaro Fernández Rojas wrote:
When BCM63xx internal switches are connected to switches with a 4-byte
Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one
based on its PVID (which is likely 0).
Right now, the packet is received by the BCM63xx internal switch and the 6-byte
tag is properly processed. The next step would to decode the corresponding
4-byte tag. However, the internal switch adds an invalid VLAN tag after the
6-byte tag and the 4-byte tag handling fails.
In order to fix this we need to remove the invalid VLAN tag after the 6-byte
tag before passing it to the 4-byte tag decoding.

Is there an errata for this invalid VLAN tag? Or is the driver simply
missing some configuration for it to produce a valid VLAN tag?

The description does not convince me you are fixing the correct
problem.

This isn't a bug per se, it's just the interaction of a packet going
through two tagging CPU ports.

My understanding of the behaviour is:

1. The external switch inserts a 4-byte Broadcom header before the
VLAN tag, and sends it to the internal switch.
2. The internal switch looks at the EtherType, finds it is not a VLAN
EtherType, so assumes it is untagged, and adds a VLAN tag based on the
configured PVID (which 0 in the default case).
3. The internal switch inserts a legacy 6-byte Broadcom header before
the VLAN tag when forwarding to its CPU port.

The internal switch does not know how to handle the (non-legacy)
Broadcom tag, so it does not know that there is a VLAN tag after it.

The internal switch enforces VLAN tags on its CPU port when it is in
VLAN enabled mode, regardless what the VLAN table's untag bit says.

The result is a bogus VID 0 and priority 0 tag between the two
Broadcom Headers. The VID would likely change based on the PVID of the
port of the external switch.

My understanding matches yours, at the very least, we should only strip
off the VLAN tag == 0, in case we are stacked onto a 4-bytes Broadcom
tag speaking switch, otherwise it seems to me we are stripping of VLAN
tags a bait too greedily.

Maybe I'm wrong here, but we're only removing the VLAN tag for a
specific case in which we shouldn't have any kind of VLAN tag, right?

For example, let's say we have an internal switch with the following ports:
- 0: LAN 1
- 4: RGMII -> External switch
- 8: CPU -> enetsw controller

And the external switch has the following ports:
- 0: LAN 2
- 1: LAN 3
...
- 8: CPU -> Internal switch RGMII

A. If we get a packet from LAN 1, it will only have the 6-bytes tag
(and optionally the VLAN tag).
When dsa_master_find_slave() is called, the net_device returned won't
have any kind of DSA protocol and therefore netdev_uses_dsa() will
return FALSE.

B. However, when a packet is received from LAN 2/3, the first tag
processed will be the 6-byte tag (corresponding to the internal
switch).
The 6-byte tag will identify this as coming from port 4 of the
internal switch (RGMII) and therefore dsa_master_find_slave() will
return the extsw interface which will have the DSA protocol of the
4-byte tag and netdev_uses_dsa() will return TRUE.

Only for the second case the invalid VLAN tag will be removed and
since extsw (RGMI) will never have VLANs enabled, I don't see the
problem that you suggest about removing the VLAN tags too greedily.

Am I wrong here?

I totally missed the netdev_uses_dsa() check you added on the looked up net_device, your explanation makes sense to me, thanks!
--
Florian