Re: Circular dependency between DSA switch driver and tagging protocol driver

From: Vladimir Oltean
Date: Wed Sep 08 2021 - 18:20:04 EST


On Wed, Sep 08, 2021 at 03:14:51PM -0700, Florian Fainelli wrote:
> On 9/8/2021 3:08 PM, Vladimir Oltean wrote:
> > Hi,
> >
> > Since commits 566b18c8b752 ("net: dsa: sja1105: implement TX
> > timestamping for SJA1110") and 994d2cbb08ca ("net: dsa: tag_sja1105: be
> > dsa_loop-safe"), net/dsa/tag_sja1105.ko has gained a build and insmod
> > time dependency on drivers/net/dsa/sja1105.ko, due to several symbols
> > exported by the latter and used by the former.
> >
> > So first one needs to insmod sja1105.ko, then insmod tag_sja1105.ko.
> >
> > But dsa_port_parse_cpu returns -EPROBE_DEFER when dsa_tag_protocol_get
> > returns -ENOPROTOOPT. It means, there is no DSA_TAG_PROTO_SJA1105 in the
> > list of tagging protocols known by DSA, try again later. There is a
> > runtime dependency for DSA to have the tagging protocol loaded. Combined
> > with the symbol dependency, this is a de facto circular dependency.
> >
> > So when we first insmod sja1105.ko, nothing happens, probing is deferred.
> >
> > Then when we insmod tag_sja1105.ko, we expect the DSA probing to kick
> > off where it left from, and probe the switch too.
> >
> > However this does not happen because the deferred probing list in the
> > device core is reconsidered for a new attempt only if a driver is bound
> > to a new device. But DSA tagging protocols are drivers with no struct
> > device.
> >
> > One can of course manually kick the driver after the two insmods:
> >
> > echo spi0.1 > /sys/bus/spi/drivers/sja1105/bind
> >
> > and this works, but automatic module loading based on modaliases will be
> > broken if both tag_sja1105.ko and sja1105.ko are modules, and sja1105 is
> > the last device to get a driver bound to it.
> >
> > Where is the problem?
>
> I'd say with 994d2cbb08ca, since the tagger now requires visibility into
> sja1105_switch_ops which is not great, to say the least. You could solve
> this by:
>
> - splitting up the sja1150 between a library that contains
> sja1105_switch_ops and does not contain the driver registration code
>
> - finding a different way to do a dsa_switch_ops pointer comparison, by
> e.g.: maintaining a boolean in dsa_port that tracks whether a particular
> driver is backing that port

What about 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110")?
It is essentially the same problem from a symbol usage perspective, plus
the fact that an skb queue belonging to the driver is accessed.