Re: [PATCH 1/2] iommu/iova: Flush CPU rcache for when a depot fills

From: John Garry
Date: Fri Sep 25 2020 - 10:37:35 EST


On 25/09/2020 12:53, Robin Murphy wrote:
---
  drivers/iommu/iova.c | 25 ++++++++++++++++---------
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 45a251da5453..05e0b462e0d9 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -892,9 +892,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
                   struct iova_rcache *rcache,
                   unsigned long iova_pfn)
  {
-    struct iova_magazine *mag_to_free = NULL;
      struct iova_cpu_rcache *cpu_rcache;
-    bool can_insert = false;
+    bool can_insert = false, flush = false;
      unsigned long flags;
      cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
@@ -913,13 +912,19 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
              if (rcache->depot_size < MAX_GLOBAL_MAGS) {
                  rcache->depot[rcache->depot_size++] =
                          cpu_rcache->loaded;
+                can_insert = true;
+                cpu_rcache->loaded = new_mag;
              } else {
-                mag_to_free = cpu_rcache->loaded;
+                /*
+                 * The depot is full, meaning that a very large
+                 * cache of IOVAs has built up, which slows
+                 * down RB tree accesses significantly
+                 * -> let's flush at this point.
+                 */
+                flush = true;
+                iova_magazine_free(new_mag);
              }
              spin_unlock(&rcache->lock);
-
-            cpu_rcache->loaded = new_mag;
-            can_insert = true;
          }
      }
@@ -928,9 +933,11 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
      spin_unlock_irqrestore(&cpu_rcache->lock, flags);
-    if (mag_to_free) {
-        iova_magazine_free_pfns(mag_to_free, iovad);
-        iova_magazine_free(mag_to_free);
+    if (flush) {

Do you really need this flag, or is it effectively just mirroring "!can_insert" - in theory if there wasn't enough memory to allocate a new magazine, then freeing some more IOVAs wouldn't necessarily be a bad thing to do anyway.

Right, I can reuse can_insert.


Other than that, I think this looks reasonable. Every time I look at __iova_rcache_insert() I'm convinced there must be a way to restructure it to be more streamlined overall, but I can never quite see exactly how...


We could remove the new_mag check, but the code cannot safely handle loaded/prev = NULL. Indeed, I think that the mainline code has a bug:

If the initial allocation for the loaded/prev magazines fail (give NULL) in init_iova_rcaches(), then in __iova_rcache_insert():

if (!iova_magazine_full(cpu_rcache->loaded)) {
can_insert = true;

If cpu_rcache->loaded == NULL, then can_insert is assigned true -> bang, as I experimented, below. This needs to be fixed...

Thanks,
john



ereference at virtual address 0000000000000000
[ 10.195299] Mem abort info:
[ 10.198080] ESR = 0x96000004
[ 10.201121] EC = 0x25: DABT (current EL), IL = 32 bits
[ 10.206418] SET = 0, FnV = 0
[ 10.209459] EA = 0, S1PTW = 0
[ 10.212585] Data abort info:
[ 10.215452] ISV = 0, ISS = 0x00000004
[ 10.219274] CM = 0, WnR = 0
[ 10.222228] [0000000000000000] user address but active_mm is swapper
[ 10.228569] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 10.234127] Modules linked in:
[ 10.237170] CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 5.9.0-rc5-47738-gb1ead657a3fa-dirty #658
[ 10.246548] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[ 10.255058] pstate: 60c00089 (nZCv daIf +PAN +UAO BTYPE=--)
[ 10.260620] pc : free_iova_fast+0xfc/0x280
[ 10.264703] lr : free_iova_fast+0x94/0x280
[ 10.268785] sp : ffff80002477bbb0
[ 10.272086] x29: ffff80002477bbb0 x28: 0000000000000000
[ 10.277385] x27: ffff002bc8fbb940 x26: ffff002bc727e26c
[ 10.282684] x25: 0000000000000000 x24: ffff002bc9439008
[ 10.287982] x23: 00000000000fdffe x22: 0000000000000080
[ 10.293280] x21: ffff002bc9439008 x20: 0000000000000000
[ 10.298579] x19: fffff403e9ebb700 x18: ffffffffffffffff
[ 10.303877] x17: 0000000000000001 x16: 0000000000000000
[ 10.309176] x15: 000000000000ffff x14: 0000000000000040
[ 10.314474] x13: 0000000000007fff x12: 000000000001ffff
[ 10.319772] x11: 000000000000000f x10: 0000000000006000
[ 10.325070] x9 : 0000000000000000 x8 : ffff80002477b768
[ 10.330368] x7 : 0000000000000000 x6 : 000000000000003f
[ 10.335666] x5 : 0000000000000040 x4 : 0000000000000000
[ 10.340964] x3 : fffff403e9ebb700 x2 : 0000000000000000
[ 10.346262] x1 : 0000000000000000 x0 : 0000000000000000
[ 10.351561] Call trace:
[ 10.353995]free_iova_fast+0xfc/0x280
[ 10.357731]iommu_dma_free_iova+0x64/0x70
[ 10.361814]__iommu_dma_unmap+0x9c/0xf8
[ 10.365723]iommu_dma_unmap_sg+0xa8/0xc8
[ 10.369720]dma_unmap_sg_attrs+0x28/0x50
[ 10.373717]cq_thread_v3_hw+0x2dc/0x528
[ 10.377626]irq_thread_fn+0x2c/0xa0
[ 10.381188]irq_thread+0x130/0x1e0
[ 10.384664]kthread+0x154/0x158
[ 10.387879]ret_from_fork+0x10/0x34