Re: [PATCH v2 08/10] iommufd/device: Use iommu_group_replace_domain()

From: Nicolin Chen
Date: Thu Feb 09 2023 - 16:13:35 EST


On Thu, Feb 09, 2023 at 04:00:52AM +0000, Tian, Kevin wrote:

> > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > Sent: Wednesday, February 8, 2023 5:18 AM
> >
> > iommu_group_replace_domain() is introduced to support use cases where
> > an
> > iommu_group can be attached to a new domain without getting detached
> > from
> > the old one. This replacement feature will be useful, for cases such as:
> > 1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
> > table with a larger table (PASID=N)
> > 2) Nesting mode, when switching the attaching device from an S2 domain
> > to an S1 domain, or when switching between relevant S1 domains.
> > as it allows these cases to switch seamlessly without a DMA disruption.
> >
> > So, call iommu_group_replace_domain() in the iommufd_device_do_attach().
> > And add a __iommmufd_device_detach helper to allow the replace routine
> > to
> > do a partial detach on the current hwpt that's being replaced. Though the
> > updated locking logic is overcomplicated, it will be eased, once those
> > iopt_table_add/remove_ioas and list_add/del calls are moved to hwpt's
> > allocation/destroy() functions in the coming nesting series, as that'll
> > depend on a new ->domain_alloc_user op in the iommu core.
>
> then why not moving those changes into this series to make it simple?

The simplification depends on the new ->domain_alloc_user op and
its implementation in SMMU driver, which would be introduced by
the nesting series of VT-d and SMMU respectively.

At this point, it's hard to decide the best sequence of our three
series. Putting this replace series first simply because it seems
to be closer to get merged than the other two bigger series.

> > Also, block replace operations that are from/to auto_domains, i.e. only
> > user-allocated hw_pagetables can be replaced or replaced with.
>
> where does this restriction come from? iommu_group_replace_domain()
> can switch between any two UNMANAGED domains. What is the extra
> problem in iommufd to support from/to auto domains?

It was my misunderstanding. We should have supported that.
Will fix in v3 and add the corresponding support.

> > +/**
> > + * __iommmufd_device_detach - Detach a device from idev->hwpt to
> > new_hwpt
>
> 'from ... to ...' means a replace semantics. then this should be called
> iommufd_device_replace_hwpt().

Had a hard time to think about the naming, it's indeed a detach-
only routine, but it takes a parameter named new_hwpt...

Perhaps I should just follow Yi's suggestion by rephrasing the
narrative of this function.

> > +static void __iommmufd_device_detach(struct iommufd_device *idev,
> > + struct iommufd_hw_pagetable
> > *new_hwpt,
> > + bool detach_group)
> > +{
> > + struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> > + struct iommufd_ioas *new_ioas = NULL;
> > +
> > + if (new_hwpt)
> > + new_ioas = new_hwpt->ioas;
> > +
> > + mutex_lock(&hwpt->devices_lock);
> > + list_del(&idev->devices_item);
> > + if (hwpt->ioas != new_ioas)
> > + mutex_lock(&hwpt->ioas->mutex);
>
> I think new_ioas->mutext was meant here.

new_hwpt is an input from an replace routine, where it holds the
new_ioas->mutex already. Yi's right that the code here is a bit
confusing. I will try to change it a bit for readability.

> > + if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> > + if (list_empty(&hwpt->devices)) {
> > + iopt_table_remove_domain(&hwpt->ioas->iopt,
> > + hwpt->domain);
> > + list_del(&hwpt->hwpt_item);
> > + }
>
> I'm not sure how this can be fully shared between detach and replace.
> Here some work e.g. above needs to be done before calling
> iommu_group_replace_domain() while others can be done afterwards.

This iopt_table_remove_domain/list_del is supposed to be done in
the hwpt's destroy() actually. We couldn't move it because it'd
need the new domain_alloc_user op and its implementation in ARM
driver. Overall, I think it should be safe to put it behind the
iommu_group_replace_domain().

Thanks
Nic