Re: [REGRESSION 5.19.x] AMD HD-audio devices missing on 5.19

From: Jason Gunthorpe
Date: Mon Aug 22 2022 - 21:00:49 EST


On Mon, Aug 22, 2022 at 04:12:59PM +0200, Takashi Iwai wrote:
> Hi,
>
> we've received regression reports about the missing HD-audio devices
> on AMD platforms, and this turned out to be caused by the commit
> 512881eacfa72c2136b27b9934b7b27504a9efc2
> bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
>
> The details are found in openSUSE bugzilla:
> https://bugzilla.suse.com/show_bug.cgi?id=1202492
>
> The problem seems to be that HD-audio (both onboard analog and HDMI)
> PCI devices are assigned to the same IOMMU group as AMD graphics PCI
> device, and once after the AMDGPU is initialized beforehand, those
> audio devices can't be probed since iommu_device_use_default_domain()
> returns -EBUSY.

Can you describe exactly what drivers are involved in this? If it is
the above commit then several devices are sharing an iommu group and
one of them (well, the only one already attached, I suppose) has made
the group unsharable.

With grep I don't see an obvious place where the AMDGPU driver would
mess with the iommu configuration, so I have no guess.

It would be good to have some debugging to confirm if it is
group->owner (should be impossible, suggests memory corruption if it
is) or group->domain != group->default_domain.

Most likely it is the later, but I can't see how that could happen on
a system like this.. There is no obvious manipulation in AMDGPU, for
instance.

So debugging to find the backtrace for exactly when
group->domain != group->default_domain
Occurs for the troubled group would be necessary.

If you know the group name it would be easy enough to cook a patch to
throw a warn on when group->domain changes

> domain assignment. In anyway, disabling IOMMU works around the
> problem, and passing driver_managed_dma flag to the HD-audio driver
> was also confirmed to work around it, too.

Disabling iommu removes the groups entirely, this disables the check.

driver_managed_dma disables the check entirely - which raises the
question how the driver is even able to work..

If the domain is not the default_domain it is very surprising that DMA
can work at all. Since it does, something really odd has happened.

Jason