Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

From: Chris Clayton
Date: Sat Sep 29 2018 - 03:39:09 EST


Sorry, sent by accident. Note to self - don't attempt email until after second cup of coffee.

On 29/09/2018 08:25, Chris Clayton wrote:
>
>
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>>> Hi,
>>>>
>>>>> Hi,
>>>>>
>>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a
>>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>>>
>>>>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that
>>>>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I
>>>>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the
>>>>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with
>>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again.
>>>>
>>>> Please have a look at the following thread:
>>>> https://lkml.org/lkml/2018/9/25/1118
>>>>
>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact chip version.
>> Can you provide the dmesg part with the XID?

I meant to say that I have now re-enabled MSI in 4.18.7 - the latest stable series kernel in which eth0 continues to
function reliably after a suspend/resume cycle. The second dmesg output below is taken from that kernel. The first one
was from an up-to-date 4.19 kernel
>
> $ dmesg | grep -i r8169
> [ 5.320679] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [ 5.321432] r8169 0000:05:00.2: can't disable ASPM; OS doesn't have ASPM control
> [ 5.322892] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 19
> [ 5.323786] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> [ 10.232077] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI
> [ 10.235218] r8169 0000:05:00.2 eth0: link down
> [ 11.717460] r8169 0000:05:00.2 eth0: link up
>
> $ dmesg | grep -i r8169
> [ 5.208040] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [ 5.208677] r8169 0000:05:00.2: can't disable ASPM; OS doesn't have ASPM control
> [ 5.210066] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29
> [ 5.210676] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> [ 10.456081] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI
> [ 10.459217] r8169 0000:05:00.2 eth0: link down
> [ 10.459880] r8169 0000:05:00.2 eth0: link down
> [ 12.015158] r8169 0000:05:00.2 eth0: link up
>
>
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
>
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI
> has a very clear "say Y".

As I said above I have re-enabled MSI.
>
>>
>> Heiner
>>
>>>> Maciej
>>>>
>>> Chris
>>>
>>