Re: [PATCH RFC RESEND 3/6] iommu/riscv: support GSCID

From: Jason Gunthorpe
Date: Tue May 07 2024 - 12:25:53 EST


On Tue, May 07, 2024 at 11:52:15PM +0800, Zong Li wrote:
> On Tue, May 7, 2024 at 11:15 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Tue, May 07, 2024 at 10:25:57PM +0800, Zong Li wrote:
> > > @@ -919,29 +924,43 @@ static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain,
> > > rcu_read_lock();
> > >
> > > prev = NULL;
> > > - list_for_each_entry_rcu(bond, &domain->bonds, list) {
> > > - iommu = dev_to_iommu(bond->dev);
> > >
> > > - /*
> > > - * IOTLB invalidation request can be safely omitted if already sent
> > > - * to the IOMMU for the same PSCID, and with domain->bonds list
> > > - * arranged based on the device's IOMMU, it's sufficient to check
> > > - * last device the invalidation was sent to.
> > > - */
> > > - if (iommu == prev)
> > > - continue;
> > > -
> > > - riscv_iommu_cmd_inval_vma(&cmd);
> > > - riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid);
> > > - if (len && len >= RISCV_IOMMU_IOTLB_INVAL_LIMIT) {
> > > - for (iova = start; iova < end; iova += PAGE_SIZE) {
> > > - riscv_iommu_cmd_inval_set_addr(&cmd, iova);
> > > + /*
> > > + * Host domain needs to flush entries in stage-2 for MSI mapping.
> > > + * However, device is bound to s1 domain instead of s2 domain.
> > > + * We need to flush mapping without looping devices of s2 domain
> > > + */
> > > + if (domain->gscid) {
> > > + riscv_iommu_cmd_inval_gvma(&cmd);
> > > + riscv_iommu_cmd_inval_set_gscid(&cmd, domain->gscid);
> > > + riscv_iommu_cmd_send(iommu, &cmd, 0);
> > > + riscv_iommu_cmd_iofence(&cmd);
> > > + riscv_iommu_cmd_send(iommu, &cmd, RISCV_IOMMU_QUEUE_TIMEOUT);
> >
> > Is iommu null here? Where did it come from?
> >
> > This looks wrong too. The "bonds" list is sort of misnamed, it is
> > really a list of invalidation instructions. If you need a special
> > invalidation instruction for this case then you should allocate a
> > memory and add it to the bond list when the attach is done.
> >
> > Invalidation should simply iterate over the bond list and do the
> > instructions it contains, always.
>
> I messed up this piece of code while cleaning it. I will fix it in the
> next version. However, after your tips, it seems to me that we should
> allocate a new bond entry in the s2 domain's list.

Yes, when the nest is attached the S2's bond should get a GSCID
invalidation instruction and the S1's bond should get no invalidation
instruction. Bond is better understood as "paging domain invalidation
instructions".

(also if you follow this advice, then again, I don't see why the idr
allocators are global)

You have to make a decision on the user invalidation flow, and some of
that depends on what you plan to do for ATS invalidation.

It is OK to make the nested domain locked to a single iommu, enforced
at attach time, but don't use the bond list to do it.

For single iommu you'd just de-virtualize the PSCID and the ATS vRID
and stuff the commands into the single iommu. But this shouldn't
interact with the bond. The bond list is about invalidation
instructions for the paging domain. Just loosely store the owning
instace in the nesting domain struct.

Also please be careful that you don't advertise ATS support to the
guest before the kernel driver understands how to support it. You
should probably put a note to that effect in the uapi struct for the
get info patch.

Jason