Re: [PATCH RFC v1 1/2] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done()

From: Robin Murphy
Date: Tue Jun 10 2025 - 12:31:20 EST


On 2025-06-10 4:36 pm, Jason Gunthorpe wrote:
On Tue, Jun 10, 2025 at 03:40:40PM +0100, Robin Murphy wrote:
On 2025-06-10 2:04 pm, Jason Gunthorpe wrote:
On Tue, Jun 10, 2025 at 12:07:00AM -0700, Nicolin Chen wrote:
On Tue, Jun 10, 2025 at 12:26:07PM +0800, Baolu Lu wrote:
On 6/10/25 02:45, Nicolin Chen wrote:
+ ops = dev_iommu_ops(dev);

Should this be protected by group->mutext?

Not seemingly, but should require the iommu_probe_device_lock I
think.

group and ops are not permitted to change while a driver is attached..

IIRC the FLR code in PCI doesn't always ensure that (due to the sysfs
paths), so yeah, this looks troubled. iommu_probe_device_lock perhaps
would fix it.

No, iommu_probe_device_lock is for protecting access to dev->iommu in the
probe path until the device is definitively assigned to a group (or not).
Fundamentally it defends against multiple sources triggering a probe of the
same device in parallel - once the device *is* probed it is no longer
relevant, and the group mutex is the right thing to protect all subsequent
operations.

Yes, adding iommu_probe_device_lock to iommu_deinit_device() would be
gross.

but something is required to protect the load of
dev->iommu_group.. dev->iommu_group->mutex can't protect itself
against a race UAF on deinit.

Then you do iommu_group_get/put() around it as well. And I suppose technically you also ought to check that the device is still actually in the group once you have acquired the mutex (although perhaps that case ends up crashing anyway once we proceed to attempting to reset the already-disappeared device itself...)

READ_ONCE is good enough to protect from races with the probe path, no
need for iommu_probe_device_lock there.

In this case need to look at the PCI sysfs for races against the
iommu_release_device()/etc that is freeing the dev->iommu_group.

Maybe the sysfs is always removed before we get to release. Or maybe
the PCI FLR sysfs code should hold the device_lock..

From a quick skim I suspect it's probably OK - at least device_del() gets to bus_remove_device()->device_remove_groups() well enough before the BUS_NOTIFY_REMOVED_DEVICE call that triggers iommu_release_device().

And on an unrelated thought that's just come to mind, if we ever did FLR with PASIDs enabled, presumably that's going to wipe out the PASID configuration, so will the caller who requested the reset actually expect the attachments at the IOMMU end to be preserved, or would they assume to start over from scratch? Seems like there's not necessarily one right answer there :/

Thanks,
Robin.