On 09/04/2025 4:56 pm, Naresh Kamboju wrote:
On Wed, 2 Apr 2025 at 21:04, Robin Murphy <robin.murphy@xxxxxxx> wrote:
On 31/03/2025 5:03 am, Naresh Kamboju wrote:
Regressions on arm64 Juno-r2 devices detect SSD tests failed on the
Linux next and Linux mainline.
First seen on the v6.14-7245-g5c2a430e8599
Good: v6.14
Bad: v6.14-7422-gacb4f33713b9
Sorry, I can't seem to reproduce this on my end, both today's mainline
and acb4f33713b9 with my config, and even acb4f33713b9 with the linked
LKFT config, all work OK on my Juno r2 (using a SATA SSD and PCIe
networking). The only thing which stands out in your log is that PCI
seems to give up probing and assigning resources beyond the switch
downstream ports (so SATA and ethernet are never discovered), whereas on
mine it does[2]. However that all happens before the first IOMMU
instance probes (which conveniently is the PCIe one), so it's hard to
imagine how that could have an effect anyway...
The only obvious difference is that I'm using EDK2 rather than U-Boot,
so that's done all the PCIe configuration once already, but it doesn't
seem like that's significant - looking back at a random older log[1],
the on-board endpoints were still being picked up right after
reconfiguring the switch, well before the IOMMU comes into the picture.
Since it is a still issue on mainline and next,
Bisected and reverted patch ^ causing kernel warnings at boot time
but finding the SSD drive,
[bcb81ac6ae3c2ef95b44e7b54c3c9522364a245c]
iommu: Get DT/ACPI parsing into the proper probe path
pcieport 0000:00:00.0: late IOMMU probe at driver bind, something fishy here!
WARNING: at drivers/iommu/iommu.c:559 __iommu_probe_device
I see boot warnings [1]
I am happy to test debug patches if you have any.
Seeing the warning after reverting the commit which introduced the warning mostly just means the conflict resolution in the revert wasn't right (there were some subsequent fixups...)
Anyway, I have now managed to get my Juno booting with the same antique version of U-Boot and finally reproduce the issue. It seems to be somehow connected to bus->dma_configure() being called in the device_add() notifier (even though the rest of the IOMMU setup doesn't run at that point since the driver hasn't registered yet), but how and why that prevents the buses behind the switch downstream ports being probed, and why *that* only happens when the switch isn't already configured, remains a mystery so far. I'm still digging...