RE: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong
Date: Tue Dec 27 2016 - 00:25:40 EST
> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@xxxxxxxxx]
> Sent: Tuesday, December 27, 2016 1:14 PM
> To: Kweh, Hock Leong <hock.leong.kweh@xxxxxxxxx>; David S. Miller
> <davem@xxxxxxxxxxxxx>; Joao Pinto <Joao.Pinto@xxxxxxxxxxxx>; Giuseppe
> CAVALLARO <peppe.cavallaro@xxxxxx>; seraphin.bonnaffe@xxxxxx
> Cc: Alexandre TORGUE <alexandre.torgue@xxxxxxxxx>; Joachim Eastwood
> <manabian@xxxxxxxxx>; Niklas Cassel <niklas.cassel@xxxxxxxx>; Johan Hovold
> <johan@xxxxxxxxxx>; pavel@xxxxxx; Ong, Boon Leong
> <boon.leong.ong@xxxxxxxxx>; netdev <netdev@xxxxxxxxxxxxxxx>; LKML <linux-
> kernel@xxxxxxxxxxxxxxx>; Voon, Weifeng <weifeng.voon@xxxxxxxxx>; Lars
> Persson <lars.persson@xxxxxxxx>
> Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and
> stmmac_dvr_probe
>
>
>
> On 12/26/2016 09:10 PM, Florian Fainelli wrote:
> >
> >
> > On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
> >> From: "Kweh, Hock Leong" <hock.leong.kweh@xxxxxxxxx>
> >>
> >> If kernel module stmmac driver being loaded after OS booted, there is a
> >> race condition between stmmac_open() and stmmac_mdio_register(), which
> is
> >> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> >> PHY not found and stmmac_open() failed:
> >> [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> >> stmmac_dvr_probe: warning: cannot get CSR clock
> >> [ 473.919382] stmmaceth 0000:01:00.0: no reset control found
> >> [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
> >> [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register
> supported
> >> [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine
> supported
> >> [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
> >> [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> >> Enable RX Mitigation via HW Watchdog Timer
> >> [ 473.921395] libphy: PHY stmmac-1:00 not found
> >> [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
> >> [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach
> to
> >> PHY (error: -19)
> >> [ 473.959710] libphy: stmmac: probed
> >> [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
> >> (stmmac-1:00) active
> >> [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
> >> (stmmac-1:01)
> >> [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
> >> (stmmac-1:02)
> >> [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
> >> (stmmac-1:03)
> >>
> >> The resolution used wait_for_completion_interruptible() to synchronize
> >> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> >> happening.
> >
> > The proper fix for this would be to have register_netdev() be the last
> > thing done in stmmac_drv_probe(), whereas right now, the last thing done
> > is stmmac_mdio_register(), leading the window you are seeing here, where
> > the network interface can be open prior to all resources being set up,
> > including, but not limited to MDIO devices.
>
> Something like the following untested patch should plug this race:
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index bb40382e205d..5910ea51f8f6 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3339,13 +3339,6 @@ int stmmac_dvr_probe(struct device *device,
>
> spin_lock_init(&priv->lock);
>
> - ret = register_netdev(ndev);
> - if (ret) {
> - netdev_err(priv->dev, "%s: ERROR %i registering the
> device\n",
> - __func__, ret);
> - goto error_netdev_register;
> - }
> -
> /* If a specific clk_csr value is passed from the platform
> * this means that the CSR Clock Range selection cannot be
> * changed at run-time and it is fixed. Viceversa the driver'll
> try to
> @@ -3372,11 +3365,14 @@ int stmmac_dvr_probe(struct device *device,
> }
> }
>
> - return 0;
> + ret = register_netdev(ndev);
> + if (ret)
> + netdev_err(priv->dev, "%s: ERROR %i registering the
> device\n",
> + __func__, ret);
> +
> + return ret;
>
> error_mdio_register:
> - unregister_netdev(ndev);
> -error_netdev_register:
> netif_napi_del(&priv->napi);
> error_hw_init:
> clk_disable_unprepare(priv->pclk);
>
> --
> Florian
Thanks. Will try out to confirm.
Regards,
Wilson