Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

From: Heiner Kallweit
Date: Sat May 30 2020 - 18:36:49 EST


On 30.05.2020 23:43, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@xxxxxxx>
>
> In kernel 4.19 (and probably earlier too) there are issues surrounding
> the PHY_AN state.
>
> For example, if a PHY is in PHY_AN state and AN has not finished, then
> what is supposed to happen is that the state machine gets rescheduled
> until it is, or until the link_timeout reaches zero which triggers an
> autoneg restart process.
>
> But actually the rescheduling never works if the PHY uses interrupts,
> because the condition under which rescheduling occurs is just if
> phy_polling_mode() is true. So basically, this whole rescheduling
> functionality works for AN-not-yet-complete just by mistake. Let me
> explain.
>
> Most of the time the AN process manages to finish by the time the
> interrupt has triggered. One might say "that should always be the case,
> otherwise the PHY wouldn't raise the interrupt, right?".
> Well, some PHYs implement an .aneg_done method which allows them to tell
> the state machine when the AN is really complete.
> The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> copper autoneg completes, the driver still keeps the "aneg_done"
> variable unset until in-band SGMII autoneg finishes too (there is no
> interrupt for that). So we have the premises of a race condition.
>
That's not nice from the PHY:
It signals "link up", and if the system asks the PHY for link details,
then it sheepishly says "well, link is *almost* up".

Question would be whether the same happens with other SGMII-capable
PHY's so that we need to cater for this scenario in phylib.
Or whether we consider it a chip quirk. In the latter case a custom
read_status() handler might do the trick too: if link is reported
as up then wait until aneg is signaled as done too before reading
further link details.

And it's interesting that nobody else stumbled across this problem
before. I mean the PHY we talk about isn't really new. Or is your
use case so special?

> In practice, what really happens depends on the log level of the serial
> console. If the log level is verbose enough that kernel messages related
> to the Ethernet link state are printed to the console, then this gives
> in-band AN enough time to complete, which means the link will come up
> and everyone will be happy. But if the console is not that verbose, the
> link will sometimes come up, and sometimes will be forced down by the
> .aneg_done of the PHY driver (forever, since we are not rescheduling).
>
> The conclusion is that an extra condition needs to be explicitly added,
> so that the state machine can be rescheduled properly. Otherwise PHY
> devices in interrupt mode will never work properly if they have an
> .aneg_done callback.
>
> In more recent kernels, the whole PHY_AN state was removed by Heiner
> Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib
> state machine" series here:
>
> https://patchwork.ozlabs.org/cover/994464/
>
> and the problem was just masked away instead of being addressed with a
> punctual patch.
>
> Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
> ---
> I'm not sure the procedure I'm following is correct, sending this
> directly to Greg. The patch doesn't apply on net.
>
> drivers/net/phy/phy.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index cc454b8c032c..ca4fd74fd2c8 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work)
> struct delayed_work *dwork = to_delayed_work(work);
> struct phy_device *phydev =
> container_of(dwork, struct phy_device, state_queue);
> - bool needs_aneg = false, do_suspend = false;
> + bool recheck = false, needs_aneg = false, do_suspend = false;
> enum phy_state old_state;
> int err = 0;
> int old_link;
> @@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work)
> phy_link_up(phydev);
> } else if (0 == phydev->link_timeout--)
> needs_aneg = true;
> + else
> + recheck = true;
> break;
> case PHY_NOLINK:
> if (!phy_polling_mode(phydev))
> @@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work)
> * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
> * between states from phy_mac_interrupt()
> */
> - if (phy_polling_mode(phydev))
> + if (phy_polling_mode(phydev) || recheck)
> queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
> PHY_STATE_TIME * HZ);
> }
>