Re: [PATCH] iommu/amd: Fix event counter availability check

From: Paul Menzel
Date: Sun Feb 21 2021 - 08:45:25 EST


Dear Alexander,


Am 01.06.20 um 04:48 schrieb Paul Menzel:

[…]

Am 31.05.20 um 09:22 schrieb Alexander Monakov:

Adding Shuah Khan to Cc: I've noticed you've seen this issue on Ryzen 2400GE;
can you have a look at the patch? Would be nice to know if it fixes the
problem for you too.

On Fri, 29 May 2020, Alexander Monakov wrote:

The driver performs an extra check if the IOMMU's capabilities advertise
presence of performance counters: it verifies that counters are writable
by writing a hard-coded value to a counter and testing that reading that
counter gives back the same value.

Unfortunately it does so quite early, even before pci_enable_device is
called for the IOMMU, i.e. when accessing its MMIO space is not
guaranteed to work. On Ryzen 4500U CPU, this actually breaks the test:
the driver assumes the counters are not writable, and disables the
functionality.

Moving init_iommu_perf_ctr just after iommu_flush_all_caches resolves
the issue. This is the earliest point in amd_iommu_init_pci where the
call succeeds on my laptop.

Signed-off-by: Alexander Monakov <amonakov@xxxxxxxxx>
Cc: Joerg Roedel <joro@xxxxxxxxxx>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
---

PS. I'm seeing another hiccup with IOMMU probing on my system:
pci 0000:00:00.2: can't derive routing for PCI INT A
pci 0000:00:00.2: PCI INT A: not connected

Hopefully I can figure it out, but I'd appreciate hints.

I guess it’s a firmware bug, but I contacted the linux-pci folks [1].

Unfortunately, it’s still present in Linux 5.11.

  drivers/iommu/amd_iommu_init.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5b81fd16f5fa..1b7ec6b6a282 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1788,8 +1788,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
      if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
          amd_iommu_np_cache = true;
-    init_iommu_perf_ctr(iommu);
-
      if (is_rd890_iommu(iommu->dev)) {
          int i, j;
@@ -1891,8 +1889,10 @@ static int __init amd_iommu_init_pci(void)
      init_device_table_dma();
-    for_each_iommu(iommu)
+    for_each_iommu(iommu) {
          iommu_flush_all_caches(iommu);
+        init_iommu_perf_ctr(iommu);
+    }
      if (!ret)
          print_iommu_info();

base-commit: 75caf310d16cc5e2f851c048cd597f5437013368

Thank you very much for fixing this issue, which is almost two years old for me.

Tested-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
MSI MSI MS-7A37/B350M MORTAR with AMD Ryzen 3 2200G
Link: https://lore.kernel.org/linux-iommu/20180727102710.GA6738@xxxxxxxxxx/

Just a small note, that I am applying your patch, but it looks like there is still some timing issue. At least today, I noticed it during one boot with Linux 5.11. (Before I never noticed it again in the several years, but I am not always paying attention and do not save the logs.)


Kind regards,

Paul


[1]: https://lore.kernel.org/linux-pci/8579bd14-e369-1141-917b-204d20cff528@xxxxxxxxxxxxx/