Re: [PATCH] iommu/iova: Retry from last rb tree node if iova search fails

From: Robin Murphy
Date: Thu May 07 2020 - 14:33:53 EST


On 2020-05-07 7:22 pm, Ajay kumar wrote:
On 5/7/20, Robin Murphy <robin.murphy@xxxxxxx> wrote:
On 2020-05-06 9:01 pm, vjitta@xxxxxxxxxxxxxx wrote:
From: Vijayanand Jitta <vjitta@xxxxxxxxxxxxxx>

When ever a new iova alloc request comes iova is always searched
from the cached node and the nodes which are previous to cached
node. So, even if there is free iova space available in the nodes
which are next to the cached node iova allocation can still fail
because of this approach.

Consider the following sequence of iova alloc and frees on
1GB of iova space

1) alloc - 500MB
2) alloc - 12MB
3) alloc - 499MB
4) free - 12MB which was allocated in step 2
5) alloc - 13MB

After the above sequence we will have 12MB of free iova space and
cached node will be pointing to the iova pfn of last alloc of 13MB
which will be the lowest iova pfn of that iova space. Now if we get an
alloc request of 2MB we just search from cached node and then look
for lower iova pfn's for free iova and as they aren't any, iova alloc
fails though there is 12MB of free iova space.

Yup, this could definitely do with improving. Unfortunately I think this
particular implementation is slightly flawed...

To avoid such iova search failures do a retry from the last rb tree node
when iova search fails, this will search the entire tree and get an iova
if its available

Signed-off-by: Vijayanand Jitta <vjitta@xxxxxxxxxxxxxx>
---
drivers/iommu/iova.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a953..2985222 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
iova_domain *iovad,
unsigned long flags;
unsigned long new_pfn;
unsigned long align_mask = ~0UL;
+ bool retry = false;

if (size_aligned)
align_mask <<= fls_long(size - 1);
@@ -198,6 +199,8 @@ static int __alloc_and_insert_iova_range(struct
iova_domain *iovad,

curr = __get_cached_rbnode(iovad, limit_pfn);
curr_iova = rb_entry(curr, struct iova, node);
+
+retry_search:
do {
limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
new_pfn = (limit_pfn - size) & align_mask;
@@ -207,6 +210,14 @@ static int __alloc_and_insert_iova_range(struct
iova_domain *iovad,
} while (curr && new_pfn <= curr_iova->pfn_hi);

if (limit_pfn < size || new_pfn < iovad->start_pfn) {
+ if (!retry) {
+ curr = rb_last(&iovad->rbroot);

Why walk when there's an anchor node there already? However...
+1

+ curr_iova = rb_entry(curr, struct iova, node);
+ limit_pfn = curr_iova->pfn_lo;

...this doesn't look right, as by now we've lost the original limit_pfn
supplied by the caller, so are highly likely to allocate beyond the
range our caller asked for. In fact AFAICS we'd start allocating from
directly directly below the anchor node, beyond the end of the entire
address space.
+1

The logic I was imagining we want here was something like the rapidly
hacked up (and untested) diff below.

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a9536eca6..3574c19272d6 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -186,6 +186,7 @@ static int __alloc_and_insert_iova_range(struct
iova_domain *iovad,
unsigned long flags;
unsigned long new_pfn;
unsigned long align_mask = ~0UL;
+ unsigned long alloc_hi, alloc_lo;

if (size_aligned)
align_mask <<= fls_long(size - 1);
@@ -196,17 +197,27 @@ static int __alloc_and_insert_iova_range(struct
iova_domain *iovad,
size >= iovad->max32_alloc_size)
goto iova32_full;

+ alloc_hi = IOVA_ANCHOR;
+ alloc_lo = iovad->start_pfn;
+retry:
curr = __get_cached_rbnode(iovad, limit_pfn);
curr_iova = rb_entry(curr, struct iova, node);
+ if (alloc_hi < curr_iova->pfn_hi) {
+ alloc_lo = curr_iova->pfn_hi;
+ alloc_hi = limit_pfn;
+ }
+
do {
- limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
- new_pfn = (limit_pfn - size) & align_mask;
+ alloc_hi = min(alloc_hi, curr_iova->pfn_lo);
During retry case, the curr and curr_iova is not updated. Kindly check it.

Right, after we've used the cached node to set the lower limit for the retry pass, we also need to search the tree for the next node above limit_pfn for the actual starting point.

Did I mention this was a completely untested brain-dump? :D

Thanks,
Robin.

Ajay
+ new_pfn = (alloc_hi - size) & align_mask;
prev = curr;
curr = rb_prev(curr);
curr_iova = rb_entry(curr, struct iova, node);
} while (curr && new_pfn <= curr_iova->pfn_hi);

- if (limit_pfn < size || new_pfn < iovad->start_pfn) {
+ if (limit_pfn < size || new_pfn < alloc_lo) {
+ if (alloc_lo == iovad->start_pfn)
+ goto retry;
iovad->max32_alloc_size = size;
goto iova32_full;
}
_______________________________________________
iommu mailing list
iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/iommu