Re: [PATCH v2 4/8] iommu: Factor out some helpers

From: Jason Gunthorpe
Date: Mon Jan 30 2023 - 13:21:47 EST


On Mon, Jan 30, 2023 at 06:05:12PM +0000, Robin Murphy wrote:
> > * Use a function instead of an array here because the domain-type is a
> > * bit-field, so an array would waste memory.
> > @@ -338,7 +352,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
> > dev->iommu->iommu_dev = iommu_dev;
> > dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
> > - group = iommu_group_get_for_dev(dev);
> > + group = iommu_group_get_for_dev(dev, ops);
>
> Why? We've literally just assigned dev->iommu->iommu_dev 2 lines above; it
> is not allowed to have invalid ops. Plus in future they may potentially be a
> different set of device ops from the ones we initially found to provide
> ->probe_device - in that case we definitely want group_get_for_dev to use
> the proper instance ops. This is the only place it is (and should be)
> called, so I don't see any problem.

Sure, but IMHO it was clearer to pass the ops down than to obtain it
again. But maybe this could be tidied in a different way.

> > if (IS_ERR(group)) {
> > ret = PTR_ERR(group);
> > goto out_release;
> > @@ -414,7 +428,8 @@ int iommu_probe_device(struct device *dev)
> > mutex_unlock(&group->mutex);
> > iommu_group_put(group);
> > - ops = dev_iommu_ops(dev);
> > + /* __iommu_probe_device() set the ops */
> > + ops = dev_iommu_ops_safe(dev);
> > if (ops->probe_finalize)
> > ops->probe_finalize(dev);
> > @@ -430,14 +445,13 @@ int iommu_probe_device(struct device *dev)
> > void iommu_release_device(struct device *dev)
> > {
> > - const struct iommu_ops *ops;
> > + const struct iommu_ops *ops = dev_iommu_ops(dev);
>
> This is just moving an existing check around.

The point is to have one check. If you need to get the ops and you
don't know the state of dev then you calll dev_iommu_ops() and check
for NULL. Simple and consistent.

> > - if (!dev->iommu)
> > + if (!ops)
> > return;

As you pointed out below this isn't even coded right since iommu can
be non-NULL.

> > @@ -2620,7 +2626,11 @@ void iommu_get_resv_regions(struct device *dev, struct list_head *list)
> > {
> > const struct iommu_ops *ops = dev_iommu_ops(dev);
> > - if (ops->get_resv_regions)
> > + /*
> > + * Non-API groups still expose reserved_regions in sysfs, so filter out
> > + * calls that get here that way.
> > + */
> > + if (ops && ops->get_resv_regions)
>
> This is just moving the existing check from the public interface to
> pointlessly impose it on the internal call path as well.

Who cares? We don't need to micro-optimize this stuff. The fact there
are a few bugs already is proof enough of that.

> > ops->get_resv_regions(dev, list);
> > }
> > @@ -2979,6 +2989,11 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
> > /* Since group has only one device */
> > grp_dev = list_first_entry(&group->devices, struct group_device, list);
> > dev = grp_dev->dev;
> > + if (!dev_iommu_ops(dev)) {
> > + /* The group doesn't have an iommu driver attached */
> > + mutex_unlock(&group->mutex);
> > + return -EINVAL;
> > + }
>
> Withot any ops, group->default_domain will be NULL so we could never even
> get here.

Fair enough, deserves a comment

> > get_device(dev);
> > /*
> > @@ -3301,7 +3316,7 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
> > const struct iommu_ops *ops;
> > list_for_each_entry(device, &group->devices, list) {
> > - ops = dev_iommu_ops(device->dev);
> > + ops = dev_iommu_ops_safe(device->dev);
> > ops->remove_dev_pasid(device->dev, pasid);
> > }
> > }
> > @@ -3413,6 +3428,9 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> > const struct iommu_ops *ops = dev_iommu_ops(dev);
> > struct iommu_domain *domain;
> > + if (!ops)
> > + return NULL;
>
> Anyone who incorrectly calls sva_domain_alloc() directly without having
> successfully called iommu_dev_enable_feature() first deserves to crash.

Lets not build assumptions like this into the code please. That
iommu_dev_enable_feature() thing is on my hitlist too :(

> (Incidentally, you've missed enable/disable_feature here.)

Yes, because they don't call dev_iommu_ops for some reason. It should
be fixed too.

> > +/* May return NULL if the device has no iommu driver */
> > static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
> > {
> > - /*
> > - * Assume that valid ops must be installed if iommu_probe_device()
> > - * has succeeded. The device ops are essentially for internal use
> > - * within the IOMMU subsystem itself, so we should be able to trust
> > - * ourselves not to misuse the helper.
> > - */
> > + if (!dev->iommu)
> > + return NULL;
> > return dev->iommu->iommu_dev->ops;
>
> This is actively broken, since dev->iommu may be non-NULL before
> dev->iommu->iommu_dev is set (a fwspec will always be set before the
> instyance is registered, and a device may potentially hang around in that
> state for evertt if the relevant IOMMU instance never appears).

Sure, I missed a NULL

> Sorry, I don't think any of this makes sense :/

The point is to be more consistent and not just blindly add more
helpers. If you need ops then ask for the ops. If you aren't sure the
dev has ops then check ops for NULL. It is pretty simple.

Jason