Re: [PATCH 2/3] net: dsa: b53: mmap: register MDIO Mux bus controller

From: Florian Fainelli
Date: Fri Mar 17 2023 - 12:41:40 EST


On 3/17/23 09:23, Álvaro Fernández Rojas wrote:
El vie, 17 mar 2023 a las 15:29, Vladimir Oltean (<olteanv@xxxxxxxxx>) escribió:

On Fri, Mar 17, 2023 at 03:17:12PM +0100, Álvaro Fernández Rojas wrote:
The proposed solution is too radical for a problem that was not properly
characterized yet, so this patch set has my temporary NACK.

Forgive me, but why do you consider this solution too radical?

Because it involves changing device tree bindings (stable ABI) in an
incompatible way.


But maybe Florian or Jonas can give some more details about the issue...

I think you also have the tools necessary to investigate this further.
We need to know what resource belonging to the switch is it that the
MDIO mux needs. Where is the earliest place you can add the call to
b53_mmap_mdiomux_init() such that your board works reliably? Note that
b53_switch_register() indirectly calls b53_setup(). By placing this
function where you have, the entirety of b53_setup() has finished
execution, and we don't know exactly what is it from there that is
needed.

In the following link you will find different bootlogs related to
different scenarios all of them with the same result: any attempt of
calling b53_mmap_mdiomux_init() earlier than b53_switch_register()
will either result in a kernel panic or a device hang:
https://gist.github.com/Noltari/b0bd6d5211160ac7bf349d998d21e7f7

1. before b53_switch_register():

2. before dsa_register_switch():

3. before b53_switch_init():

Did you read what I said?

Yes, but I didn't get your point, sorry for that.


| Note that b53_switch_register() indirectly calls b53_setup(). By placing
| this function where you have, the entirety of b53_setup() has finished
| execution, and we don't know exactly what is it from there that is
| needed.

Can you place the b53_mmap_mdiomux_init() in various places within
b53_setup() to restrict the search further?

I tried and these are the results:
https://gist.github.com/Noltari/d5bdba66b8f2e392c9e4c2759661d862

All of them hang when dsa_tree_setup() is called for DSA tree 1
(external switch) without having completely setup DSA tree 0 (internal
switch):
[ 1.471345] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
[ 1.481099] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found
[ 1.506752] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12
[ 1.594365] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog
[ 1.612008] NET: Registered PF_INET6 protocol family
[ 1.645617] Segment Routing with IPv6
[ 1.649547] In-situ OAM (IOAM) with IPv6
[ 1.653948] NET: Registered PF_PACKET protocol family
[ 1.659984] 8021q: 802.1Q VLAN Support v1.8
[ 1.699193] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0
[ 2.124257] bcm53xx 0.1:1e: found switch: BCM53125, rev 4
*** Device hang ***

I don't know if there's a way to defer the probe of DSA tree 1 (the
external switch) until DSA tree 0 (the internal switch) is completely
setup, because that would probably be the only alternative way of
fixing this.

Could you find out which part is hanging? It looks like there is a busy waiting operation that we never complete?

DSA should be perfectly capable of dealing with disjoint trees being cascaded to one another, as this is entirely within how the framework is designed.

What I suspect might be happening is a "double programming" effect, similar or identical to what was described in this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8c6cd1d316f3b01ae578d8e29179f6396c0eaa2

using the MDIO mux would properly isolate the pseudo PHYs of the switch such that a given MDIO write does not end up programming *both* the internal and external switches. It could also be a completely different problem.
--
Florian