Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
From: Jason Gunthorpe
Date: Mon Apr 26 2021 - 08:38:25 EST
On Sun, Apr 25, 2021 at 09:24:46AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Friday, April 23, 2021 7:50 PM
> >
> > On Fri, Apr 23, 2021 at 09:06:44AM +0000, Tian, Kevin wrote:
> >
> > > Or could we still have just one /dev/ioasid but allow userspace to create
> > > multiple gpa_ioasid_id's each associated to a different iommu domain?
> > > Then the compatibility check will be done at ATTACH_IOASID instead of
> > > JOIN_IOASID_FD.
> >
> > To my mind what makes sense that that /dev/ioasid presents a single
> > IOMMU behavior that is basically the same. This may ultimately not be
> > what we call a domain today.
> >
> > We may end up with a middle object which is a group of domains that
> > all have the same capabilities, and we define capabilities in a way
> > that most platforms have a single group of domains.
> >
> > The key capability of a group of domains is they can all share the HW
> > page table representation, so if an IOASID instantiates a page table
> > it can be assigned to any device on any domain in the gruop of domains.
>
> Sorry that I didn't quite get it. If a group of domains can share the
> same page table then why not just attaching all devices under those
> domains into a single domain?
Sure, if that works. But you shouldn't have things like IOMMU_CACHE
create different domains or trigger different /dev/ioasid's
> to describe the HW page table. Ideally a new iommu domain should
> be created only when it's impossible to share an existing page table.
> Otherwise you'll get bad iotlb efficiency because each domain has its
> unique domain id (tagged in iotlb) then duplicated iotlb entries may
> exist even when a single page table is shared by those domains.
Right, fewer is better
> Or, can you elaborate what is the targeted usage by having a group of
> domains which all share the same page table?
You just need to have clear rule what what requires a new /dev/ioasid
FD - and if it maps to domains then great.
> Want to hear your opinion for one open here. There is no doubt that
> an ioasid represents a HW page table when the table is constructed by
> userspace and then linked to the IOMMU through the bind/unbind
> API. But I'm not very sure about whether an ioasid should represent
> the exact pgtable or the mapping metadata when the underlying
> pgtable is indirectly constructed through map/unmap API. VFIO does
> the latter way, which is why it allows multiple incompatible domains
> in a single container which all share the same mapping metadata.
I think VFIO's map/unmap is way too complex and we know it has bad
performance problems.
If /dev/ioasid is single HW page table only then I would focus on that
implementation and leave it to userspace to span different
/dev/ioasids if needed.
> OK, now I see where the disconnection comes from. In my context ioasid
> is the identifier that is actually used in the wire, but seems you treat it as
> a sw-defined namespace purely for representing page tables. We should
> clear this concept first before further discussing other details. 😊
There is no general HW requirement that every IO page table be
referred to by the same PASID and this API would have to support
non-PASID IO page tables as well. So I'd keep the two things
separated in the uAPI - even though the kernel today has a global
PASID pool.
> Then following your proposal, does it mean that we need another
> interface for allocating PASID? and since ioasid means different
> thing in uAPI and in-kernel API, possibly a new name is required to
> avoid confusion?
I would suggest have two ways to control the PASID
1) Over /dev/ioasid allocate a PASID for an IOASID. All future PASID
based usages of the IOASID will use that global PASID
2) Over the device FD, when the IOASID is bound return the PASID that
was selected. If the IOASID does not have a global PASID then the
kernel is free to make something up. In this mode a single IOASID
can have multiple PASIDs.
Simple things like DPDK can use #2 and potentially have better PASID
limits. hypervisors will most likely have to use #1, but it depends on
how their vIOMMU interface works.
I think the name IOASID is fine for the uAPI, the kernel version can
be called ioasid_id or something.
(also looking at ioasid.c, why do we need such a thin and odd wrapper
around xarray?)
Jason