Re: Kernel 6.7 regression doesn't boot if using AMD eGPU
From: Vasant Hegde
Date: Tue Apr 16 2024 - 08:44:34 EST
Robin,
On 4/16/2024 4:55 PM, Robin Murphy wrote:
On 2024-04-16 1:39 am, Jason Gunthorpe wrote:
On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
On 2024-04-15 7:57 pm, Eric Wagner wrote:
Apologies if I made a mistake in the first bisect, I'm new to kernel
debugging.
I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
bisect log is attached. It ended up at the same commit as before.
I've also attached a picture of the boot screen that occurs when it hangs.
0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
problem.
../...
"Failing" iommu_probe_device is merely how we tell ourselves that we're not
interested in a device, and consequently tell the rest of the kernel it doesn't
have an IOMMU (via device_iommu_mapped() returning false). This is normal and
expected for devices which legitimately have no IOMMU in the first place;
conversely we don't do a great deal for unexpected failures since those
typically represent system-fatal conditions whatever we might try to do. We've
never had much of a notion of expected failures when an IOMMU *is* present, but
even then, denying any trace of the IOMMU and removing ourselves from the
picture is clearly not the ideal way to approach that. We're running off a bus
notifier (or even later), so ultimately our return value is meaningless; at that
point the device already exists and has been added to its bus, we can't undo that.
However it looks to be even more fun if failure occurs in *deferred* default
domain creation via bus_iommu_probe(), since then we give up and dismiss the
entire IOMMU. Except the x86 drivers ignore the return from
iommu_device_register(), so further hilarity ensues...
I think I've now satisfied myself that a simple fix for the core code is
appropriate and will write that up now; one other thing I couldn't quite figure
out is whether the AMD driver somehow prevents PASIDs being used while the group
is attached to a non-identity (and non-nested) domain - that's probably one for
Vasant to confirm.
AMD driver supports PASID with below domain type :
- Identity domain
- DMA translation mode (DMA and DMA_FQ) with AMD v2 page table
(amd_iommu=pgtbl_v2).
Currently amd_iommu_def_domain_type() tries to put PASID capable devices in
identity domain mode. This is something to fix. Its in my TODO list. I will try
to get into it soon.
Hope this clarifies.
-Vasant