Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

From: Jason Gunthorpe
Date: Thu May 13 2021 - 18:31:30 EST


On Thu, May 13, 2021 at 01:22:51PM -0700, Jacob Pan wrote:
> Hi Tony,
>
> On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" <tony.luck@xxxxxxxxx>
> wrote:
>
> > On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > > It seems there are two options:
> > > 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU
> > > page table/domain, mark the device is PASID only with a flag. Use DMA
> > > APIs to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU
> > > driver will decide whether to use system PASID domain during map/unmap.
> > > Not clear if we also need to make IOVA==kernel VA.
> > >
> > > 2. Add a new IOMMU API to setup a system PASID which points to
> > > init_mm.pgd. This API only allows trusted device to bind with the
> > > system PASID at its own risk. There is no need for DMA API. This is the
> > > same as the current code except with an explicit API.
> > >
> > > Which option?
> >
> > Option #1 looks cleaner to me. Option #2 gives access to bits
> > of memory that the users of system PASID shouldn't ever need
> > to touch ... just map regions of memory that the kernel has
> > a "struct page" for.
> >
> > What does "use DMA APIs to explicitly map/unmap" mean? Is that
> > for the whole region?
> >
> If we map the entire kernel direct map during system PASID setup, then we
> don't need to use DMA API to map/unmap certain range.
>
> I was thinking this system PASID page table could be on-demand. The mapping
> is built by explicit use of DMA map/unmap APIs.

Option 1 should be the PASID works exactly like a normal RID and uses
all the normal DMA APIs and IOMMU mechanisms, whatever the platform
implements. This might mean an iommu update on every operation or not.

> > I'm expecting that once this system PASID has been initialized,
> > then any accelerator device with a kernel use case would use the
> > same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> > & decompression, etc.
> >
> OK, sounds like we have to map the entire kernel VA with struct page as you
> said. So we still by-pass DMA APIs, can we all agree on that?

Option 2 should be the faster option, but not available in all cases.

Option 1 isn't optional. DMA and IOMMU code has to be portable and
this is the portable API.

If you want to do option 1 and option 2 then give it a go, but in most
common cases with the IOMMU in a direct map you shouldn't get a
notable performance win.

Jason