Re: [PATCH net-next v4 05/12] net: ethernet: oa_tc6: implement error interrupts unmasking

From: Parthiban.Veerasooran
Date: Thu May 02 2024 - 06:10:42 EST


Hi Ramon,

On 01/05/24 11:59 pm, Ramón Nordin Rodriguez wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>>> n | name | min | avg | max | rx dropped | samples
>>> 1 | no mod | 827K | 846K | 891K | 945 | 5
>>> 2 | no log | 711K | 726K | 744K | 562 | 5
>>> 3 | less irq | 815K | 833K | 846K | N/A | 5
>>> 4 | no irq | 914K | 924K | 931K | N/A | 5
>>> 5 | simple | 857K | 868K | 879K | 615 | 5
>>
>> That is odd.
>>
>> Side question: What CONFIG_HZ= do you have? 100, 250, 1000? Try
>> 1000. I've seen problems where the driver wants to sleep for a short
>> time, but the CONFIG_HZ value limits how short a time it can actually
>> sleep. It ends up sleeping much longer than it wants.
>>
>
> I have been doing my best to abuse the link some more. In brief tweaking
> CONFIG_HZ has some but limited effect.
> Saturating the link with the rx buffer interrupt enabled breaks the driver.
> Saturating the link with the rx buffer interrupt disabled has poor
> performance.
>
> The following scenario has been tested. Both ends of the link run:
> * server.py
> * client.py
>
> One end is an arm64 quad core running at 1.2GHz with the lan8650 macphy.
> The other end is an amd 3950x running the lan8670 usb eval board.
> Both systems should be fast enough that running python should not be a
> limiting factor.
>
> -- The test code --
> server.py
> #!/bin/env python3
> import socket
>
> def serve(sock: socket.socket):
> while True:
> client, addr = sock.accept()
> print(f'connection from: {addr}')
> while len(client.recv(2048)) > 0:
> pass
> print('client disconnected')
> client.close()
>
> if __name__ == '__main__':
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> sock.bind(('0.0.0.0', 4040))
> sock.listen(1)
> serve(sock)
> print("something went wrong")
>
> client.py
> #!/bin/env python3
> import socket
> import sys
>
> if __name__ == '__main__':
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.connect((sys.argv[1], 4040))
>
> while True:
> sock.sendall(b'0'*2048)
>
> -- test runs --
> run 1 - all interrupts enabled
> Time to failure:
> 1 min or less
>
> Kernel output:
> [ 94.361312] sched: RT throttling activated
>
> top output:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 145 root -51 0 0 0 0 R 95.5 0.0 1:11.22 oa-tc6-spi-thread
>
> link stats:
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
> link/ether 32:c2:7e:22:93:99 brd ff:ff:ff:ff:ff:ff
> RX: bytes packets errors dropped missed mcast
> 3371902 7186 0 48 0 0
> RX errors: length crc frame fifo overrun
> 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 10341438 8071 0 0 0 0
> TX errors: aborted fifo window heartbt transns
> 0 0 0 0 1
> state:
> Completly borked, can't ping in or out, bringing the interface down then up
> has no effect.
> There is no SPI clock and no interrupts generated by the mac-phy.
> The worker thread seems to have live locked.
>
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> run 2 - RX_BUFFER_OVERLOW interrupt disabled
>
> state:
> Runs just fine but the oa-tc6-spi-thread is consuming 10-20% cpu
> Ping times have increased from 1-2ms to 8-35ms
>
>
> -- additional notes --
> When tweaking CONFIG_HZ I do get some changes in behaviour, the cpu
> consumption stays stable at 20%+-2 with CONFIG_HZ=250, when increased to
> CONFIG_HZ=1000 it jumps up and down between 10-20%.
>
> I don't have access to a logic analyzer but my old oscilloscope is
> almost reliable. I could confirm that the spi clock is indeed running at
> the expected 25MHz, but I could observe some gaps of up to 320µs so
> that's 8k spi cycles spent doing something else.
> These gaps were observed on the SPI clock and the macphy interrupt was
> active for the same ammount of time(though this was measured independently
> and not on the same trigger).
> I've been drinking way to much coffe, so soldering is not gonna happen
> today (shaky hands), but if it helps I can solder wires to attach both
> probes to confirm that the gap in the SPI clock happens at the same time
> or not as the interrupt is active.
>
> I'd be keen on hearing what Microchips plans to address. If tracking
> down performance issues is a priority I'll probably not spend any time
> on it, if not then I'll definetly dig into it more.
I tried to reproduce the issue in my setup with your above applications.
But surprisingly I couldn't reproduce the issue you are facing.

One end is Raspberry Pi 4 with lan8651 MAC-PHY and the other end is
Raspberry Pi 4 with EVB-LAN8670-USB Stick.

lan8651 MAC-PHY side:
---------------------

pi@raspberrypi:~/lan865x/v4_mainline $ python3 server.py
connection from: ('192.168.5.101', 46608)

Top output:
-----------
top - 14:28:14 up 1:14, 4 users, load average: 0.68, 0.67, 0.63
Tasks: 201 total, 1 running, 200 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 1.1 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.1 si,
0.0 st
MiB Mem : 7810.0 total, 7110.5 free, 392.3 used, 432.4
buff/cache
MiB Swap: 100.0 total, 100.0 free, 0.0 used. 7417.7 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND

7219 root -51 0 0 0 0 S 2.7 0.0 1:13.11
oa-tc6-spi-thread

35307 pi 20 0 16280 9216 5248 S 0.7 0.1 0:16.53
python3

88831 pi 20 0 11728 4864 2816 R 0.3 0.1 0:00.22
top

89819 vnc 20 0 2320 384 384 S 0.3 0.0 0:00.04
sh

1 root 20 0 168652 11580 8436 S 0.0 0.1 0:03.85
systemd

2 root 20 0 0 0 0 S 0.0 0.0 0:00.01
kthreadd

3 root 20 0 0 0 0 S 0.0 0.0 0:00.00
pool_workqueue_release

4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/R-rcu_g

5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/R-rcu_p

6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/R-slub_

7 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/R-netns

11 root 20 0 0 0 0 I 0.0 0.0 0:00.03
kworker/u8:0-ext4-rsv-conversion

12 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/R-mm_pe

ifconfig output:
----------------
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.5.100 netmask 255.255.255.0 broadcast 0.0.0.0
ether 04:05:06:01:02:03 txqueuelen 1000 (Ethernet)
RX packets 1589879 bytes 2391045582 (2.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1022419 bytes 71954905 (68.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 55

dmesg output:
-------------
pi@raspberrypi:~/lan865x/v4_mainline $ dmesg
[ 234.019968] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY
driver (mii_bus:phy_addr=spi0.0:00, irq=POLL)
[ 234.069387] lan865x spi0.0 eth1: Link is Up - 10Mbps/Half - flow
control off

EVB-LAN8670-USB stick side:
---------------------------
pi@pv-rpi-tp2:~/microchip/t1s-usb $ python3 client.py 192.168.5.100

ifconfig output:
----------------
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.5.101 netmask 255.255.255.0 broadcast 192.168.5.255
ether 00:1e:c0:d1:ca:39 txqueuelen 1000 (Ethernet)
RX packets 1034606 bytes 58330335 (55.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1722400 bytes 2628199197 (2.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Let me know if you want to try anything more.

Best regards,
Parthiban V
>
> Let me know if anything is unclear or if I can help out with anything
> specific.
>
> R