Re: [Intel-wired-lan] [PATCH] e1000e: fix link fluctuations problem

From: En-Wei WU
Date: Fri May 03 2024 - 03:44:33 EST


Dear Paul Menzel,

Thank you for your quick response.

> Do you mean ho*t*-plugging?
> Increas*ing*?

Yes, sorry about the misspelling.

> Could you please document what NICs you saw this
Yes. I saw this in Intel I219-LM. I haven't seen this bug on other NICs.

> and if it is documented in any datasheet/errata?
No, we couldn't find any datasheet/errata documenting this.

> Does this have any downsides on systems with non-buggy hardware?
No, I've tested other non-buggy hardwares (like I219-V) and it has no
effect on them.

>Could you please split this hunk into a separate patch?
Sure! I'll send the v2 patchset soon.

> Are there any other public bug reports and discussions you could reference?
No. We have an internal private bug report but it cannot be exposed to
the public.

Thank you for your time.

On Fri, 3 May 2024 at 13:34, Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
>
> [Fix address jesse.brandeburg@xxxxxxxx*m*]
>
>
> Dear Ricky,
>
>
> Thank you for your patch.
>
>
> Am 02.05.24 um 11:12 schrieb Ricky Wu:
> > As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642,
> > some e1000e NIC reports link up -> link down -> link up when hog-plugging
>
> Do you mean ho*t*-plugging?
>
> > the Ethernet cable.
> >
> > The problem is because the unstable behavior of Link Status bit in
> > PHY Status Register of some e1000e NIC. When we re-plug the cable,
> > the e1000e_phy_has_link_generic() (called after the Link-Status-Changed
> > interrupt) has read this bit with 1->0->1 (1=link up, 0=link down)
> > and e1000e reports it to net device layer respectively.
>
> Wow. I guess this was “fun” to debug. Could you please document, what
> NICs you saw this, and if it is documented in any datasheet/errata?
>
> > This patch solves the problem by passing polling delays on
> > e1000e_phy_has_link_generic() so that it will not get the unstable
> > states of Link Status bit.
>
> Does this have any downsides on systems with non-buggy hardware?
>
> > Also, the sleep codes in e1000e_phy_has_link_generic() only take
> > effect when error occurs reading the MII register. Moving these codes
> > forward to the beginning of the loop so that the polling delays passed
> > into this function can take effect on any situation.
>
> Could you please split this hunk into a separate patch?
>
> Should it Fixes: tag be added?
>
> Are there any other public bug reports and discussions you could reference?
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218642
>
> > Signed-off-by: Ricky Wu <en-wei.wu@xxxxxxxxxxxxx>
> > ---
> > drivers/net/ethernet/intel/e1000e/ich8lan.c | 5 ++++-
> > drivers/net/ethernet/intel/e1000e/phy.c | 10 ++++++----
> > 2 files changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> > index f9e94be36e97..c462aa6e6dee 100644
> > --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> > +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> > @@ -1427,8 +1427,11 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
> > /* First we want to see if the MII Status Register reports
> > * link. If so, then we want to get the current speed/duplex
> > * of the PHY.
> > + * Some PHYs have link fluctuations with the instability of
> > + * Link Status bit (BMSR_LSTATUS) in MII Status Register.
> > + * Increase the iteration times and delay solves the problem.
>
> Increas*ing*?
>
> > */
> > - ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link);
> > + ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT, 100000, &link);
>
> Could you please document how 100000 was chosen?
>
> > if (ret_val)
> > goto out;
> >
> > diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> > index 93544f1cc2a5..ef056363d721 100644
> > --- a/drivers/net/ethernet/intel/e1000e/phy.c
> > +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> > @@ -1776,7 +1776,13 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
> > u16 i, phy_status;
> >
> > *success = false;
> > +
> > for (i = 0; i < iterations; i++) {
> > + if (usec_interval >= 1000)
> > + msleep(usec_interval / 1000);
> > + else
> > + udelay(usec_interval);
> > +
> > /* Some PHYs require the MII_BMSR register to be read
> > * twice due to the link bit being sticky. No harm doing
> > * it across the board.
> > @@ -1799,10 +1805,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
> > *success = true;
> > break;
> > }
> > - if (usec_interval >= 1000)
> > - msleep(usec_interval / 1000);
> > - else
> > - udelay(usec_interval);
> > }
> >
> > return ret_val;