Re: [PATCH 02/11] iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes

From: Baolu Lu
Date: Thu Jul 17 2025 - 22:58:42 EST


On 7/17/25 19:55, Jason Gunthorpe wrote:
On Thu, Jul 17, 2025 at 10:40:01AM +0800, Baolu Lu wrote:
On 7/16/25 22:12, Jason Gunthorpe wrote:
On Mon, Jul 14, 2025 at 12:50:19PM +0800, Lu Baolu wrote:
@@ -1833,6 +1845,8 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (ret)
goto out_block_translation;
+ domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);

This has no locking and is in the wrong order anyhow :(

Any change to how invalidation works has to be done before attaching
the HW so that the required invalidations are already happening before
the HW can walk the page table.

And you need to serialize somehow with concurrent map/unmap as iommufd
doesn't prevent userspace from racing attach with map/unmap.

domain->iotlb_sync_map does not change the driver's behavior. It simply
indicates that there's no need to waste time calling
cache_tag_flush_range_np(), as it's just a no-op.

Of course it changes the behavior, it changes what the invalidation
callback does.

Without locking you have a race situation where a PGD is visible to HW
that requires extra flushing and the SW is not doing the extra
flushing.

Before any PGD is made visible to the HW the software must ensure all
the required invalidations are happening.

Oh, I understand now. If there is no synchronization between attach/
detach and map/unmap operations, the cache invalidation behavior must be
determined when a domain is allocated.


I previously discussed this with Kevin, and we agreed on a phase-by-
phase approach. As I mentioned, domain->iotlb_sync_map is merely a hint
for the driver, preventing it from looping through all cache tags to
determine if any cache invalidation work needs to be performed. We
already know it's predetermined that no work needs to be done.

The iteration though the cache tags is done inside a lock so it
doesn't have this race (it has the issue I mentioned setting up the
cache tage list though).

RWBF is only required on some early implementations where memory
coherence was not yet implemented by the VT-d engine. It should be
difficult to find such systems in modern environments.

Then I would set it at domain creation time, check it during attach,
and remove this race.

How about the following changes (compiled but not tested)?

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8db8be9b7e7d..bb00dc14275d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1780,18 +1780,6 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
__pa(pgd), flags, old);
}

-static bool domain_need_iotlb_sync_map(struct dmar_domain *domain,
- struct intel_iommu *iommu)
-{
- if (cap_caching_mode(iommu->cap) && intel_domain_is_ss_paging(domain))
- return true;
-
- if (rwbf_quirk || cap_rwbf(iommu->cap))
- return true;
-
- return false;
-}
-
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -1831,8 +1819,6 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (ret)
goto out_block_translation;

- domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
-
return 0;

out_block_translation:
@@ -3352,6 +3338,14 @@ intel_iommu_domain_alloc_first_stage(struct device *dev,
return ERR_CAST(dmar_domain);

dmar_domain->domain.ops = &intel_fs_paging_domain_ops;
+ /*
+ * iotlb sync for map is only needed for legacy implementations that
+ * explicitly require flushing internal write buffers to ensure memory
+ * coherence.
+ */
+ if (rwbf_quirk || cap_rwbf(iommu->cap))
+ dmar_domain->iotlb_sync_map = true;
+
return &dmar_domain->domain;
}

@@ -3386,6 +3380,14 @@ intel_iommu_domain_alloc_second_stage(struct device *dev,
if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
dmar_domain->domain.dirty_ops = &intel_dirty_ops;

+ /*
+ * Besides the internal write buffer flush, the caching mode used for
+ * legacy nested translation (which utilizes shadowing page tables)
+ * also requires iotlb sync on map.
+ */
+ if (rwbf_quirk || cap_rwbf(iommu->cap) || cap_caching_mode(iommu->cap))
+ dmar_domain->iotlb_sync_map = true;
+
return &dmar_domain->domain;
}

@@ -3446,6 +3448,11 @@ static int paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain,
if (!cap_fl1gp_support(iommu->cap) &&
(dmar_domain->domain.pgsize_bitmap & SZ_1G))
return -EINVAL;
+
+ /* iotlb sync on map requirement */
+ if ((rwbf_quirk || cap_rwbf(iommu->cap)) && !dmar_domain->iotlb_sync_map)
+ return -EINVAL;
+
return 0;
}

@@ -3469,6 +3476,12 @@ paging_domain_compatible_second_stage(struct dmar_domain *dmar_domain,
return -EINVAL;
if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G))
return -EINVAL;
+
+ /* iotlb sync on map requirement */
+ if ((rwbf_quirk || cap_rwbf(iommu->cap) || cap_caching_mode(iommu->cap)) &&
+ !dmar_domain->iotlb_sync_map)
+ return -EINVAL;
+
return 0;
}

--
2.43.0

Thanks,
baolu