Re: [PATCH v5 10/12] iommu: Prepare IOMMU domain for IOPF

From: Baolu Lu
Date: Fri May 06 2022 - 01:40:31 EST


On 2022/5/5 21:38, Jean-Philippe Brucker wrote:
Hi Baolu,

On Thu, May 05, 2022 at 04:31:38PM +0800, Baolu Lu wrote:
On 2022/5/4 02:20, Jean-Philippe Brucker wrote:
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7cae631c1baa..33449523afbe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3174,3 +3174,24 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
iommu_group_put(group);
}
+
+struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+ struct iommu_group *group;
+
+ if (!pasid_valid(pasid))
+ return NULL;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return NULL;
+
+ mutex_lock(&group->mutex);
Unfortunately this still causes the deadlock when unbind() flushes the
IOPF queue while holding the group mutex.

Sorry, I didn't get your point here.

Do you mean unbind() could hold group mutex before calling this helper?
The group mutex is only available in iommu.c. The unbind() has no means
to hold this lock. Or, I missed anything?

I wasn't clear, it's iommu_detach_device_pasid() that holds the
group->mutex:

iommu_sva_unbind_device() |
iommu_detach_device_pasid() |
mutex_lock(&group->mutex) |
domain->ops->detach_dev_pasid() | iopf_handle_group()
iopf_queue_flush_dev() | iommu_get_domain_for_dev_pasid()
... wait for IOPF work | mutex_lock(&group->mutex)
| ... deadlock

Ah! Yes. Thank you for the clarification.


Thanks,
Jean


Best regards,
baolu


If we make this function private to IOPF, then we can get rid of this
mutex_lock(). It's OK because:

* xarray protects its internal state with RCU, so we can call
xa_load() outside the lock.

* The domain obtained from xa_load is finalized. Its content is valid
because xarray stores the domain using rcu_assign_pointer(), which has a
release memory barrier, which pairs with data dependencies in IOPF
(domain->sva_ioas etc).

We'll need to be careful about this when allowing other users to install
a fault handler. Should be fine as long as the handler and data are
installed before the domain is added to pasid_array.

* We know the domain is valid the whole time IOPF is using it, because
unbind() waits for pending faults.

We just need a comment explaining the last point, something like:

/*
* Safe to fetch outside the group mutex because:
* - xarray protects its internal state with RCU
* - the domain obtained is either NULL or fully formed
* - the IOPF work is the only caller and is flushed before the
* domain is freed.
*/

Agreed. The mutex is needed only when domain could possibly be freed
before unbind(). In that case, we need this mutex and get a reference
from the domain. As we have dropped the domain user reference, this lock
is unnecessary.


Thanks,
Jean

+ domain = xa_load(&group->pasid_array, pasid);
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return domain;
+}


Best regards,
baolu