Re: [PATCH] net: stmmac: don't stop RXC during LPI

From: Florian Fainelli
Date: Sun Jan 23 2022 - 13:29:59 EST




On 1/23/2022 8:09 AM, Jisheng Zhang wrote:
On Mon, Jan 24, 2022 at 12:08:22AM +0800, Jisheng Zhang wrote:
On Sun, Jan 23, 2022 at 04:52:29PM +0100, Andrew Lunn wrote:
On Sun, Jan 23, 2022 at 10:12:45PM +0800, Jisheng Zhang wrote:
I met can't receive rx pkt issue with below steps:
0.plug in ethernet cable then boot normal and get ip from dhcp server
1.quickly hotplug out then hotplug in the ethernet cable
2.trigger the dhcp client to renew lease

tcpdump shows that the request tx pkt is sent out successfully,
but the mac can't receive the rx pkt.

The issue can easily be reproduced on platforms with PHY_POLL external
phy. If we don't allow the phy to stop the RXC during LPI, the issue
is gone. I think it's unsafe to stop the RXC during LPI because the mac
needs RXC clock to support RX logic.

And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use
false instead of 0.

Signed-off-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 6708ca2aa4f7..92a9b0b226b1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct phylink_config *config,
stmmac_mac_set(priv, priv->ioaddr, true);
if (phy && priv->dma_cap.eee) {
- priv->eee_active = phy_init_eee(phy, 1) >= 0;
+ priv->eee_active = phy_init_eee(phy, false) >= 0;

This has not caused issues in the past. So i'm wondering if this is
somehow specific to your system? Does everybody else use a PHY which
does not implement this bit? Does your synthesis of the stmmac have a
different clock tree?

By changing this value for every instance of the stmmac, you are
potentially causing a power regression for stmmac implementations
which don't need the clock. So we need a clear understanding, stopping
the clock is wrong in general and so the change is correct in

I think this is a common issue because the MAC needs phy's RXC for RX
logic. But it's better to let other stmmac users verify. The issue
can easily be reproduced on platforms with PHY_POLL external phy.
Or other platforms use a dedicated clock rather than clock from phy
for MAC's RX logic?

If the issue turns out specific to my system, then I will send out
a new patch to adopt your suggestion.


+ Joakim

Hi Joakim, IIRC, you have stmmac + external RTL8211F phy platform, but
I'm not sure whether your platform have an irq for the phy. could you
help me to check whether you can reproduce the issue on your platform?

general. Or this is specific to your system, and you probably need to
add priv->dma_cap.keep_rx_clock_ticking, which you set in your glue
driver,and use here to decide what to pass to phy_init_eee().

I suspect the problem is only or largely relevant in a RGMII configuration whereby the TXC of the MAC is an input to the PHY which then re-generates the RXC and feeds it back to the MAC as RXC (with the configured delay). If the PHY stops its clock, then MAC no longer gets a RXC and all sorts of problems would arise if the MAC logic on the RX side is dependent upon getting the PHY's RXC to be re-sampled internally within the MAC.

Now, this would be symptomatic of a fairly naive design on the MAC side to support EEE, also usually to really save power while in LPI you would want to switch your MAC from its main or fast clock (which is presumably at least 250MHz to support Gigabit rates and generate a 125MHz TXC) to a slow clock (say 25 or 27MHz) in order to actually save power on the MAC side (even if the bulk of the power is on the PHY's analog logic). When the PHY signals that we are out of LPI the MAC switches back to its main clock. This may occur with the help of the MAC driver, or this can be done autonomously sometimes.

So with all that theory and how should things be designed and so on, I think you need to investigate this problem a bit more thoroughly.

FWIW phy_init_eee()'s second argument is improperly designed. Before deciding to stop the PHY's RX clock, you should first know whether the PHY supports it to begin with, otherwise you are requesting something the is not able to do, and there is no feedback mechanism. A while back I had started this patch series which may still be relevant:

https://github.com/ffainelli/linux/commits/phy-eee-tx-clk
--
Florian