Re: [PATCH RFCv1 05/14] iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC
From: Nicolin Chen
Date: Mon May 13 2024 - 00:34:31 EST
On Sun, May 12, 2024 at 11:27:45AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 12, 2024 at 08:47:02PM -0700, Nicolin Chen wrote:
> > + viommu = iommu_dev->ops->viommu_alloc(idev->dev, cmd->type,
> > + hwpt_paging->common.domain);
> > + if (IS_ERR(viommu)) {
> > + rc = PTR_ERR(viommu);
> > + goto out_put_hwpt;
> > + }
>
> Ah you did already include the S2, So should it be
> domain->viommu_alloc() then?
We can do that. In that case, the VIOMMU_ALLOC ioctl should be
simply per S2 HWPT too v.s. per IDEV.
> > +
> > + /* iommufd_object_finalize will store the viommu->obj.id */
> > + rc = xa_alloc(&ucmd->ictx->objects, &viommu->obj.id, XA_ZERO_ENTRY,
> > + xa_limit_31b, GFP_KERNEL_ACCOUNT);
> > + if (rc)
> > + goto out_free;
> > +
> > + viommu->obj.type = IOMMUFD_OBJ_VIOMMU;
>
> See my other notes, lets try not to open code this.
Ack.
> > + viommu->type = cmd->type;
> > +
> > + viommu->ictx = ucmd->ictx;
> > + viommu->hwpt = hwpt_paging;
> > + viommu->iommu_dev = idev->dev->iommu->iommu_dev;
> > + cmd->out_viommu_id = viommu->obj.id;
> > + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > + if (rc)
> > + goto out_erase_xa;
> > + iommufd_object_finalize(ucmd->ictx, &viommu->obj);
> > + refcount_inc(&viommu->hwpt->common.obj.users);
> > + goto out_put_hwpt;
> > +
> > +out_erase_xa:
> > + xa_erase(&ucmd->ictx->objects, viommu->obj.id);
> > +out_free:
> > + if (viommu->ops && viommu->ops->free)
> > + viommu->ops->free(viommu);
> > + kfree(viommu);
>
> This really should use the abort flow. The driver free callback has to
> be in the object release..
Yea, with the original object allocator, we probably can do abort().
> > +
> > +/**
> > + * enum iommu_viommu_type - VIOMMU Type
> > + * @IOMMU_VIOMMU_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3
> > + */
> > +enum iommu_viommu_type {
> > + IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV,
> > +};
>
> At least the 241 line should be in a following patch
It's for the "enum iommu_viommu_type" mentioned in the following
structure. Yi told me that you don't like an empty enum, and he
did something like this in HWPT_INVALIDATE series:
https://lore.kernel.org/linux-iommu/20240111041015.47920-3-yi.l.liu@xxxxxxxxx/
> > +/**
> > + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC)
> > + * @size: sizeof(struct iommu_viommu_alloc)
> > + * @flags: Must be 0
> > + * @type: Type of the VIOMMU object. Must be defined in enum iommu_viommu_type
> > + * @dev_id: The device to allocate this virtual IOMMU for
> > + * @hwpt_id: ID of a nested parent HWPT
> > + * @out_viommu_id: Output virtual IOMMU ID for the allocated object
> > + *
> > + * Allocate an virtual IOMMU object that holds a (shared) nested parent HWPT
> > + */
> > +struct iommu_viommu_alloc {
> > + __u32 size;
> > + __u32 flags;
> > + __u32 type;
> > + __u32 dev_id;
> > + __u32 hwpt_id;
> > + __u32 out_viommu_id;
> > +};
>
> This seems fine.
>
> Let's have a following patch to change the hwpt_alloc to accept the
> viommu as a hwpt as a uAPI change as well.
>
> The more I think about how this needs to work the more sure I am that
> we need to do that.
>
> ARM will need a fairly tricky set of things to manage the VMID
> lifecycle. In BTM mode the VMID must come from the KVM. For vcmdq the
> VMID is needed to create the queue/viommu. For AMD the S2 is needed to
> create the VIOMMU in the first place.
>
> So, to make this all work perfectly we need approx the following
> - S2 sharing across instances in ARM - meaning the VMID is allocated
> at attach not domain alloc
> - S2 hwpt is refcounted by the VIOMMU in the iommufd layer
> - VIOMMU is refcounted by every nesting child in the iommufd layer
> - The nesting child holds a pointer to both the S2 and the VIOMMU
> (viommu optional)
> - When the nesting child attaches to a device the STE will source the
> VMID from the VIOMMU if present otherwise from the S2
> - "RID" attach (ie naked S2) will have to be done with a Nesting
> Child using a vSTE that indicates Identity. Then the attach logic
> will have enough information to get the VMID from the VIOMMU
What is this RID attach (naked S2) case? S1DSS_BYPASS + SVA?
> - In full VIOMMU mode the S2 will never get a VMID of its own, it
> will always use the VIOMMU. Life cycle is simple, the VMID is freed
> when the VIOMMU is freed. That can't happen until all Nesting
> Children are freed. That can't happen until all Nesting Children
> are detached from devices. Detatching removes the HW touch of the VMID.
So, each VM will have one S2 HWPT/domain/iopt, but each VM can
have multiple VIOMMU instances sharing that single S2 HWPT, and
each VIOMMU instance (in the SMMU driver at least) holds a vmid.
This seems to be a quite clear big picture now!
> At this point you don't need the full generality, but let's please get
> ready and get the viommu pointer available in all the right spots and
> we can keep the current logic to borrow the VMID from the S2 for the
> VIOMMU.
Yea. Will try as much as I can.
Thanks
Nicolin