Re: [RFC PATCH v3 2/3] PCI: rockchip-host: Retry link training on failure without PERST#

From: Shawn Lin
Date: Fri Jul 18 2025 - 00:02:17 EST


在 2025/07/18 星期五 11:33, Geraldo Nascimento 写道:
On Fri, Jul 18, 2025 at 09:55:42AM +0800, Shawn Lin wrote:
Hi Geraldo,

在 2025/06/11 星期三 3:05, Geraldo Nascimento 写道:
After almost 30 days of battling with RK3399 buggy PCIe on my Rock Pi
N10 through trial-and-error debugging, I finally got positive results
with enumeration on the PCI bus for both a Realtek 8111E NIC and a
Samsung PM981a SSD.

The NIC was connected to a M.2->PCIe x4 riser card and it would get
stuck on Polling.Compliance, without breaking electrical idle on the
Host RX side. The Samsung PM981a SSD is directly connected to M.2
connector and that SSD is known to be quirky (OEM... no support)
and non-functional on the RK3399 platform.

The Samsung SSD was even worse than the NIC - it would get stuck on
Detect.Active like a bricked card, even though it was fully functional
via USB adapter.

It seems both devices benefit from retrying Link Training if - big if
here - PERST# is not toggled during retry.


I didn't see this error before especially given RTL8111 NIC is widelly
used by customers.

Hi Shawn, great to hear from you!

Notice that my board exposes PCIe only via NVMe connector, and not
directly via a proper PCIe connector, so it is necessary for me to
adapt with inexpensive riser card that exposes proper PCIe connector.

I say this because while I don't doubt that the RTL8111 NIC works
out-of-the-box for boards that directly expose PCIe connector, the
combination of riser card plus NIC has a similar effect - though not
entirely equal, as described above - of connecting known good SSDs
that simply refuse to work with Rockchip-IP PCIe.

I admit that patch 1 looks a little crazy, but is has the effect of
enabling use of presently non-working devices or combination of devices
on this IP, at least on the board I have access to.


Could you help tried this?
[1] apply your patch 3 first

Sure, I'm always open for testing, but could you clarify the patch 3
part? AFAIK this series of mine only has 2 patches, so I'm a little
confused about exactly which patch to apply as a preliminary step.

Patch 3 refers to "arm64: dts: rockchip: drop PCIe 3v3 always-on and
boot-on" which let kernel fully controller the power in case firmware
did it in advanced.


Also, since you're asking me to test some code, I think it is only fair
if I ask you to test my code, too. It shouldn't be too hard for you to
find a otherwise working NVMe SSD that refuses to complete link training
with current code. Connect this SSD please to a RK3399 board and let us
know if my proposed code change does anything to ameliorate the
long-standing issue of SSD that refuses to cooperate.

Sure, I don't have Samsung PM981a SSD now, but I could try to test all
my SSDs to find if I could pick up one that won't work.


Thank you,
Geraldo Nascimento