Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

From: Robin Murphy
Date: Tue May 31 2022 - 12:01:58 EST


On 2022-05-31 16:13, Jason Gunthorpe wrote:
On Tue, May 31, 2022 at 04:01:04PM +0100, Robin Murphy wrote:
On 2022-05-31 15:53, Jason Gunthorpe wrote:
On Tue, May 31, 2022 at 10:11:18PM +0800, Baolu Lu wrote:
On 2022/5/31 21:10, Jason Gunthorpe wrote:
On Tue, May 31, 2022 at 11:02:06AM +0800, Baolu Lu wrote:

For case 2, it is a bit weird. I tried to add a rwsem lock to make the
iommu_unmap() and dumping tables in debugfs exclusive. This does not
work because debugfs may depend on the DMA of the devices to work. It
seems that what we can do is to allow this race, but when we traverse
the page table in debugfs, we will check the validity of the physical
address retrieved from the page table entry. Then, the worst case is to
print some useless information.

Sounds horrible, don't you have locking around the IOPTEs of some
kind? How does updating them work reliably?

There's no locking around updating the IOPTEs. The basic assumption is
that at any time, there's only a single thread manipulating the mappings
of the range specified in iommu_map/unmap() APIs. Therefore, the race
only exists when multiple ranges share some high-level IOPTEs. The IOMMU
driver updates those IOPTEs using the compare-and-exchange atomic
operation.

Oh? Did I miss where that was documented as part of the iommu API?

Daniel posted patches for VFIO to multi-thread iommu_domin mapping.

iommufd goes out of its way to avoid this kind of serialization so
that userspace can parallel map IOVA.

I think if this is the requirement then the iommu API needs to
provide a lock around the domain for the driver..

Eww, no, we can't kill performance by forcing serialisation on the entire
API just for one silly driver-internal debugfs corner :(

I'm not worried about debugfs, I'm worried about these efforts to
speed up VFIO VM booting by parallel domain loading:

https://lore.kernel.org/kvm/20220106004656.126790-1-daniel.m.jordan@xxxxxxxxxx/

The DMA API should maintain its own external lock, but the general
domain interface to the rest of the kernel should be safe, IMHO.

The DMA API doesn't need locking, partly since it can trust itself not to do stupid things, and mostly because it's DMA API performance that's fundamentally incompatible with serialisation anyway. Why do you think we have a complicated per-CPU IOVA caching mechanism, if not to support big multi-queue devices with multiple CPU threads mapping/unmapping in different parts of the same DMA domain concurrently?

Or at least it should be documented..

As far as I'm aware there's never been any restriction at the IOMMU API level. It should be self-evident that racing concurrent iommu_{map,unmap} requests against iommu_domain_free(), or against each other for overlapping IOVAs, is a bad idea and don't do that if you want a well-defined outcome. The simpler drivers already serialise on a per-domain lock internally, while the more performance-focused ones implement lock-free atomic pagetable management in a similar style to CPU arch code; either way it should work fine as-is. The difference with debugfs is that it's a completely orthogonal side-channel - an iommu_domain user like VFIO or iommu-dma can make sure its *own* API usage is sane, but can't be aware of the user triggering some driver-internal introspection of that domain in a manner that could race more harmfully. It has to be down to individual drivers to make that safe if they choose to expose such an interface.

Robin.