Re: [PATCH 2/3] net: dsa: b53: mmap: register MDIO Mux bus controller

From: Álvaro Fernández Rojas
Date: Fri Mar 17 2023 - 12:46:03 EST


El vie, 17 mar 2023 a las 17:41, Florian Fainelli
(<f.fainelli@xxxxxxxxx>) escribió:
>
> On 3/17/23 09:23, Álvaro Fernández Rojas wrote:
> > El vie, 17 mar 2023 a las 15:29, Vladimir Oltean (<olteanv@xxxxxxxxx>) escribió:
> >>
> >> On Fri, Mar 17, 2023 at 03:17:12PM +0100, Álvaro Fernández Rojas wrote:
> >>>> The proposed solution is too radical for a problem that was not properly
> >>>> characterized yet, so this patch set has my temporary NACK.
> >>>
> >>> Forgive me, but why do you consider this solution too radical?
> >>
> >> Because it involves changing device tree bindings (stable ABI) in an
> >> incompatible way.
> >>
> >>>>
> >>>>> But maybe Florian or Jonas can give some more details about the issue...
> >>>>
> >>>> I think you also have the tools necessary to investigate this further.
> >>>> We need to know what resource belonging to the switch is it that the
> >>>> MDIO mux needs. Where is the earliest place you can add the call to
> >>>> b53_mmap_mdiomux_init() such that your board works reliably? Note that
> >>>> b53_switch_register() indirectly calls b53_setup(). By placing this
> >>>> function where you have, the entirety of b53_setup() has finished
> >>>> execution, and we don't know exactly what is it from there that is
> >>>> needed.
> >>>
> >>> In the following link you will find different bootlogs related to
> >>> different scenarios all of them with the same result: any attempt of
> >>> calling b53_mmap_mdiomux_init() earlier than b53_switch_register()
> >>> will either result in a kernel panic or a device hang:
> >>> https://gist.github.com/Noltari/b0bd6d5211160ac7bf349d998d21e7f7
> >>>
> >>> 1. before b53_switch_register():
> >>>
> >>> 2. before dsa_register_switch():
> >>>
> >>> 3. before b53_switch_init():
> >>
> >> Did you read what I said?
> >
> > Yes, but I didn't get your point, sorry for that.
> >
> >>
> >> | Note that b53_switch_register() indirectly calls b53_setup(). By placing
> >> | this function where you have, the entirety of b53_setup() has finished
> >> | execution, and we don't know exactly what is it from there that is
> >> | needed.
> >>
> >> Can you place the b53_mmap_mdiomux_init() in various places within
> >> b53_setup() to restrict the search further?
> >
> > I tried and these are the results:
> > https://gist.github.com/Noltari/d5bdba66b8f2e392c9e4c2759661d862
> >
> > All of them hang when dsa_tree_setup() is called for DSA tree 1
> > (external switch) without having completely setup DSA tree 0 (internal
> > switch):
> > [ 1.471345] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
> > [ 1.481099] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found
> > [ 1.506752] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12
> > [ 1.594365] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog
> > [ 1.612008] NET: Registered PF_INET6 protocol family
> > [ 1.645617] Segment Routing with IPv6
> > [ 1.649547] In-situ OAM (IOAM) with IPv6
> > [ 1.653948] NET: Registered PF_PACKET protocol family
> > [ 1.659984] 8021q: 802.1Q VLAN Support v1.8
> > [ 1.699193] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
> > [ 2.124257] bcm53xx 0.1:1e: found switch: BCM53125, rev 4
> > *** Device hang ***
> >
> > I don't know if there's a way to defer the probe of DSA tree 1 (the
> > external switch) until DSA tree 0 (the internal switch) is completely
> > setup, because that would probably be the only alternative way of
> > fixing this.
>
> Could you find out which part is hanging? It looks like there is a busy
> waiting operation that we never complete?

I don't think so, but I will try to debug the exact issue and report back.

>
> DSA should be perfectly capable of dealing with disjoint trees being
> cascaded to one another, as this is entirely within how the framework is
> designed.
>
> What I suspect might be happening is a "double programming" effect,
> similar or identical to what was described in this commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8c6cd1d316f3b01ae578d8e29179f6396c0eaa2

Thanks for the info, I will look into this.

>
> using the MDIO mux would properly isolate the pseudo PHYs of the switch
> such that a given MDIO write does not end up programming *both* the
> internal and external switches. It could also be a completely different
> problem.
> --
> Florian
>