RE: [PATCH v2 08/10] iommufd/device: Use iommu_group_replace_domain()

From: Tian, Kevin
Date: Wed Feb 08 2023 - 23:03:11 EST


> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Wednesday, February 8, 2023 5:18 AM
>
> iommu_group_replace_domain() is introduced to support use cases where
> an
> iommu_group can be attached to a new domain without getting detached
> from
> the old one. This replacement feature will be useful, for cases such as:
> 1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
> table with a larger table (PASID=N)
> 2) Nesting mode, when switching the attaching device from an S2 domain
> to an S1 domain, or when switching between relevant S1 domains.
> as it allows these cases to switch seamlessly without a DMA disruption.
>
> So, call iommu_group_replace_domain() in the iommufd_device_do_attach().
> And add a __iommmufd_device_detach helper to allow the replace routine
> to
> do a partial detach on the current hwpt that's being replaced. Though the
> updated locking logic is overcomplicated, it will be eased, once those
> iopt_table_add/remove_ioas and list_add/del calls are moved to hwpt's
> allocation/destroy() functions in the coming nesting series, as that'll
> depend on a new ->domain_alloc_user op in the iommu core.

then why not moving those changes into this series to make it simple?

>
> Also, block replace operations that are from/to auto_domains, i.e. only
> user-allocated hw_pagetables can be replaced or replaced with.

where does this restriction come from? iommu_group_replace_domain()
can switch between any two UNMANAGED domains. What is the extra
problem in iommufd to support from/to auto domains?

>
> Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
> ---
> drivers/iommu/iommufd/device.c | 101 +++++++++++++++++-------
> drivers/iommu/iommufd/iommufd_private.h | 2 +
> 2 files changed, 76 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/device.c
> b/drivers/iommu/iommufd/device.c
> index b8c3e3baccb5..8a9834fc129a 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -9,6 +9,8 @@
> #include "io_pagetable.h"
> #include "iommufd_private.h"
>
> +MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
> +
> static bool allow_unsafe_interrupts;
> module_param(allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> MODULE_PARM_DESC(
> @@ -194,9 +196,61 @@ static bool
> iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt,
> return false;
> }
>
> +/**
> + * __iommmufd_device_detach - Detach a device from idev->hwpt to
> new_hwpt

'from ... to ...' means a replace semantics. then this should be called
iommufd_device_replace_hwpt().

> + * @idev: device to detach
> + * @new_hwpt: new hw_pagetable to attach (pass in NULL for a simple
> detach)
> + * @detach_group: flag to call iommu_detach_group
> + *
> + * This is a cleanup helper shared by the replace and detach routines.
> Comparing
> + * to a detach routine, a replace routine only needs a partial detach
> procedure:
> + * it does not need the iommu_detach_group(); it will attach the device to a
> new
> + * hw_pagetable after a partial detach from the currently attached
> hw_pagetable,
> + * so certain steps can be skipped if two hw_pagetables have the same IOAS.
> + */
> +static void __iommmufd_device_detach(struct iommufd_device *idev,
> + struct iommufd_hw_pagetable
> *new_hwpt,
> + bool detach_group)
> +{
> + struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> + struct iommufd_ioas *new_ioas = NULL;
> +
> + if (new_hwpt)
> + new_ioas = new_hwpt->ioas;
> +
> + mutex_lock(&hwpt->devices_lock);
> + list_del(&idev->devices_item);
> + if (hwpt->ioas != new_ioas)
> + mutex_lock(&hwpt->ioas->mutex);

I think new_ioas->mutext was meant here.

> + if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> + if (list_empty(&hwpt->devices)) {
> + iopt_table_remove_domain(&hwpt->ioas->iopt,
> + hwpt->domain);
> + list_del(&hwpt->hwpt_item);
> + }

I'm not sure how this can be fully shared between detach and replace.
Here some work e.g. above needs to be done before calling
iommu_group_replace_domain() while others can be done afterwards.