Re: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS

From: Joerg Roedel
Date: Tue Mar 28 2017 - 16:28:25 EST


On Tue, Mar 28, 2017 at 08:18:26PM +0000, Deucher, Alexander wrote:
> > -----Original Message-----
> > From: Joerg Roedel [mailto:joro@xxxxxxxxxx]
> > Sent: Tuesday, March 28, 2017 8:17 AM
> > To: Bjorn Helgaas
> > Cc: linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Joerg Roedel;
> > Daniel Drake; Deucher, Alexander
> > Subject: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS
> >
> > From: Joerg Roedel <jroedel@xxxxxxx>
> >
> > ATS is broken on these devices. Under invalidation load, the
> > GPU does not reply to invalidations anymore, causing
> > Completion-wait loop timeouts on the AMD IOMMU driver side.
> > Fix it by not enabling ATS on these devices.
> >
> > Note that below mentioned commit is not broken, it just
> > triggers the issue because it might cause invalidation
> > storms on devices.
> >
> > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > Reported-by: Daniel Drake <drake@xxxxxxxxxxxx>
> > Cc: Daniel Drake <drake@xxxxxxxxxxxx>
> > Cc: Alexander Deucher <Alexander.Deucher@xxxxxxx>
> > Signed-off-by: Joerg Roedel <jroedel@xxxxxxx>
>
> Did you see Arindam's patch from yesterday[1]? Not sure which is the proper fix, maybe both?

Arindam's patch makes sense on its own, but not as a fix for this issue.
It lowers the invalidation load on the GPU, but there are still ways to
trigger a high invalidation rate on the device. So it might hide the
issue, but not fix it.

We need to disable ATS on the device if it doesn't work reliably.



Joerg