Re: [PATCH net] net: phy: Warn about incorrect mdio_bus_phy_resume() state

From: Marek Szyprowski
Date: Tue Aug 16 2022 - 07:46:15 EST



On 12.08.2022 18:32, Florian Fainelli wrote:
> On 8/12/22 04:19, Marek Szyprowski wrote:
>> Hi All,
>>
>> On 02.08.2022 01:34, Florian Fainelli wrote:
>>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to
>>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
>>> that we can produce a race condition looking like this:
>>>
>>> CPU0                        CPU1
>>> bcmgenet_resume
>>>    -> phy_resume
>>>      -> phy_init_hw
>>>    -> phy_start
>>>      -> phy_resume
>>> phy_start_aneg()
>>> mdio_bus_phy_resume
>>>    -> phy_resume
>>>       -> phy_write(..., BMCR_RESET)
>>>        -> usleep()                                  -> phy_read()
>>>
>>> with the phy_resume() function triggering a PHY behavior that might
>>> have
>>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
>>> brcm_fet_config_init()") for instance) that ultimately leads to an
>>> error
>>> reading from the PHY.
>>>
>>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC
>>> driver manages PHY PM")
>>> Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
>>
>> This patch, as probably intended, triggers a warning during system
>> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM
>> Juno R1 board on the kernel compiled from next-202208010:
>>
>>    ------------[ cut here ]------------
>>    WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323
>> mdio_bus_phy_resume+0x34/0xc8
>>    Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative
>> crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x]
>>    CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940
>>    Hardware name: ARM Juno development board (r1) (DT)
>>    pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>    pc : mdio_bus_phy_resume+0x34/0xc8
>>    lr : dpm_run_callback+0x74/0x350
>>    ...
>>    Call trace:
>>     mdio_bus_phy_resume+0x34/0xc8
>>     dpm_run_callback+0x74/0x350
>>     device_resume+0xb8/0x258
>>     dpm_resume+0x120/0x4a8
>>     dpm_resume_end+0x14/0x28
>>     suspend_devices_and_enter+0x164/0xa60
>>     pm_suspend+0x25c/0x3a8
>>     state_store+0x84/0x108
>>     kobj_attr_store+0x14/0x28
>>     sysfs_kf_write+0x60/0x70
>>     kernfs_fop_write_iter+0x124/0x1a8
>>     new_sync_write+0xd0/0x190
>>     vfs_write+0x208/0x478
>>     ksys_write+0x64/0xf0
>>     __arm64_sys_write+0x14/0x20
>>     invoke_syscall+0x40/0xf8
>>     el0_svc_common.constprop.3+0x8c/0x120
>>     do_el0_svc+0x28/0xc8
>>     el0_svc+0x48/0xd0
>>     el0t_64_sync_handler+0x94/0xb8
>>     el0t_64_sync+0x15c/0x160
>>    irq event stamp: 24406
>>    hardirqs last  enabled at (24405): [<ffff8000090c4734>]
>> _raw_spin_unlock_irqrestore+0x8c/0x90
>>    hardirqs last disabled at (24406): [<ffff8000090b3164>]
>> el1_dbg+0x24/0x88
>>    softirqs last  enabled at (24144): [<ffff800008010488>]
>> _stext+0x488/0x5cc
>>    softirqs last disabled at (24139): [<ffff80000809bf98>]
>> irq_exit_rcu+0x168/0x1a8
>>    ---[ end trace 0000000000000000 ]---
>>
>> I hope the above information will help fixing the driver.
>
> Yes this is catching an actual issue in the driver in that the PHY
> state machine is still running while the system is trying to suspend.
> We could go about fixing it in a different number of ways, though I
> believe this one is probably correct enough to work and fix the warning:

Right, this fixes the warning (after fixing the minor compile issue, see
below).

Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>

Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>

>
> diff --git a/drivers/net/ethernet/smsc/smsc911x.c
> b/drivers/net/ethernet/smsc/smsc911x.c
> index 3bf20211cceb..e9c0668a4dc0 100644
> --- a/drivers/net/ethernet/smsc/smsc911x.c
> +++ b/drivers/net/ethernet/smsc/smsc911x.c
> @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device
> *dev)
>                 return ret;
>         }
>
> +       /* Indicate that the MAC is responsible for managing PHY PM */
> +       phydev->mac_managed_pm = true;
>         phy_attached_info(phydev);
>
>         phy_set_max_speed(phydev, SPEED_100);
> @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev)
>         if (netif_running(ndev)) {
>                 netif_stop_queue(ndev);
>                 netif_device_detach(ndev);
> +               if (!device_may_wakeup(dev))
> +                       phy_suspend(dev->phydev);
                        phy_suspend(ndev->phydev);
> }
>
>         /* enable wake on LAN, energy detection and the external PME
> @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev)
>         if (netif_running(ndev)) {
>                 netif_device_attach(ndev);
>                 netif_start_queue(ndev);
> +               if (!device_may_wakeup(dev))
> +                       phy_resume(dev->phydev);
                        phy_resume(ndev->phydev);
> }
>
>         return 0;
>
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland