Re: [PATCH 2/3] net: dsa: b53: mmap: register MDIO Mux bus controller

From: Álvaro Fernández Rojas
Date: Mon Mar 20 2023 - 11:29:10 EST


El lun, 20 mar 2023 a las 11:27, Jonas Gorski
(<jonas.gorski@xxxxxxxxx>) escribió:
>
> On Sun, 19 Mar 2023 at 10:45, Álvaro Fernández Rojas <noltari@xxxxxxxxx> wrote:
> >
> > El vie, 17 mar 2023 a las 17:41, Florian Fainelli
> > (<f.fainelli@xxxxxxxxx>) escribió:
> > >
> > > On 3/17/23 09:23, Álvaro Fernández Rojas wrote:
> > > > El vie, 17 mar 2023 a las 15:29, Vladimir Oltean (<olteanv@xxxxxxxxx>) escribió:
> > > >>
> > > >> On Fri, Mar 17, 2023 at 03:17:12PM +0100, Álvaro Fernández Rojas wrote:
> > > >>>> The proposed solution is too radical for a problem that was not properly
> > > >>>> characterized yet, so this patch set has my temporary NACK.
> > > >>>
> > > >>> Forgive me, but why do you consider this solution too radical?
> > > >>
> > > >> Because it involves changing device tree bindings (stable ABI) in an
> > > >> incompatible way.
> > > >>
> > > >>>>
> > > >>>>> But maybe Florian or Jonas can give some more details about the issue...
> > > >>>>
> > > >>>> I think you also have the tools necessary to investigate this further.
> > > >>>> We need to know what resource belonging to the switch is it that the
> > > >>>> MDIO mux needs. Where is the earliest place you can add the call to
> > > >>>> b53_mmap_mdiomux_init() such that your board works reliably? Note that
> > > >>>> b53_switch_register() indirectly calls b53_setup(). By placing this
> > > >>>> function where you have, the entirety of b53_setup() has finished
> > > >>>> execution, and we don't know exactly what is it from there that is
> > > >>>> needed.
> > > >>>
> > > >>> In the following link you will find different bootlogs related to
> > > >>> different scenarios all of them with the same result: any attempt of
> > > >>> calling b53_mmap_mdiomux_init() earlier than b53_switch_register()
> > > >>> will either result in a kernel panic or a device hang:
> > > >>> https://gist.github.com/Noltari/b0bd6d5211160ac7bf349d998d21e7f7
> > > >>>
> > > >>> 1. before b53_switch_register():
> > > >>>
> > > >>> 2. before dsa_register_switch():
> > > >>>
> > > >>> 3. before b53_switch_init():
> > > >>
> > > >> Did you read what I said?
> > > >
> > > > Yes, but I didn't get your point, sorry for that.
> > > >
> > > >>
> > > >> | Note that b53_switch_register() indirectly calls b53_setup(). By placing
> > > >> | this function where you have, the entirety of b53_setup() has finished
> > > >> | execution, and we don't know exactly what is it from there that is
> > > >> | needed.
> > > >>
> > > >> Can you place the b53_mmap_mdiomux_init() in various places within
> > > >> b53_setup() to restrict the search further?
> > > >
> > > > I tried and these are the results:
> > > > https://gist.github.com/Noltari/d5bdba66b8f2e392c9e4c2759661d862
> > > >
> > > > All of them hang when dsa_tree_setup() is called for DSA tree 1
> > > > (external switch) without having completely setup DSA tree 0 (internal
> > > > switch):
> > > > [ 1.471345] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
> > > > [ 1.481099] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found
> > > > [ 1.506752] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12
> > > > [ 1.594365] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog
> > > > [ 1.612008] NET: Registered PF_INET6 protocol family
> > > > [ 1.645617] Segment Routing with IPv6
> > > > [ 1.649547] In-situ OAM (IOAM) with IPv6
> > > > [ 1.653948] NET: Registered PF_PACKET protocol family
> > > > [ 1.659984] 8021q: 802.1Q VLAN Support v1.8
> > > > [ 1.699193] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
> > > > [ 2.124257] bcm53xx 0.1:1e: found switch: BCM53125, rev 4
> > > > *** Device hang ***
> > > >
> > > > I don't know if there's a way to defer the probe of DSA tree 1 (the
> > > > external switch) until DSA tree 0 (the internal switch) is completely
> > > > setup, because that would probably be the only alternative way of
> > > > fixing this.
> > >
> > > Could you find out which part is hanging? It looks like there is a busy
> > > waiting operation that we never complete?
> >
> > After many tests I was able to find the part that was hanging the device.
> > It turns out that if the MDIO bus controller is registered soon
> > enough, b53_phy_read16 will be called for the RGMII port on the
> > internal switch:
> > [ 4.042698] b53-switch 10e00000.switch: b53_phy_read16: ds=81fede80
> > phy_read16=00000000 addr=4 reg=2
> > It turns out that the device is hanging on the following line of
> > b53_phy_read16():
> > b53_read16(priv, B53_PORT_MII_PAGE(addr), reg * 2, &value);
> > Maybe it's not safe to access B53_PORT_MII_PAGE() on MMAP switches?
>
> If you are following the example from 1/3, then I think I see what the
> issue might be here:
>
> You are labeling the port where the external switch is connected as
> "extsw", which is neither "cpu" nor "dsa", so it is treated as a
> normal/user port (which it shouldn't). If you change its label to
> "dsa" (which AFAIU would be the correct one to denote a daisy chained
> switch) it should not try to access port 4's MDIO registers (via the
> slave mdio bus).

Correct me if I'm wrong, but I think that the configuration you're
suggesting would be for a different kind of switches layout.
In this case I'm using a disjoint tree setup since both switches are
using different tags incompatible with each other.

So the proper switch layout should be the following (for a Huawei
HG253s, which is a BCM6362 with a WAN port on the internal switch port
5 and the external switch BCM53124S connected to the internal switch
on port 4):
https://gist.github.com/Noltari/975374f908bb056ecf2544d289255b2e#file-b53-disjoint-switch-trees-hg253s-v2-dts

BTW, this is the log (as you can see b53_mmap_phy_read16() is used for
ports 4 & 5 at the beginning. After the switch initialization, the
mdio-mux bus controller is used):
https://gist.github.com/Noltari/975374f908bb056ecf2544d289255b2e#file-b53-disjoint-switch-trees-hg253s-v2-log

I think that we need this kind of layout because as we already
discussed on "net: dsa: tag_brcm: legacy: fix daisy-chained switches"
(https://patchwork.kernel.org/project/netdevbpf/patch/20230317120815.321871-1-noltari@xxxxxxxxx/)
it's the only way of removing the incorrect VLAN tag added by the
internal BCM63xx switch when it receives a packet from the external
switch on port 4 (RGMII).
Otherwise, the double tag won't be processed correctly due to the
invalid VLAN tag and the packet won't be correctly forwarded from the
external switch to the internal switch.

But I may be wrong here since I don't have much experience with DSA...
That's why I decided to upstream all the patches that I sent lately,
to seek for help of the experts :)

>
> Can you check if that helps?
>
> Regards
> Jonas

Best regards,
Álvaro.