Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

From: Jason Gunthorpe
Date: Tue May 11 2021 - 07:48:55 EST


On Mon, May 10, 2021 at 08:31:45PM -0700, Jacob Pan wrote:
> Hi Jason,
>
> On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> > On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> >
> > > +/*
> > > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > > used only
> > > + * for access to kernel addresses. No IOTLB flushes are automatically
> > > done
> > > + * for kernel mappings; it is valid only for access to the kernel's
> > > static
> > > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > > mappings.
> > > + * A future API addition may permit the use of such ranges, by means
> > > of an
> > > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > > + *
> > > + * It is unlikely that we will ever hook into flush_tlb_kernel_range()
> > > to
> > > + * do such IOTLB flushes automatically.
> > > + */
> > > +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
> >
> > Huh? That isn't really SVA, can you call it something saner please?
> >
> This is shared kernel virtual address, I am following the SVA lib naming
> since this is where the flag will be used. Why this is not SVA? Kernel
> virtual address is still virtual address. Is it due to direct map?

As the above explains it doesn't actually synchronize the kernel's
address space it just shoves the direct map into the IOMMU.

I suppose a different IOMMU implementation might point the PASID directly
at the kernel's page table and avoid those limitations - but since
that isn't portable it seems irrelevant.

Since the only thing it really maps is the direct map I would just
call it direct_map, or all physical or something.

How does this interact with the DMA APIs? How do you get CPU cache
flushing/etc into PASID operations that don't trigger IOMMU updates?

Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
does IDXD use normal DMA on the RID for kernel controlled accesses?

> > Is it really a PASID that always has all of physical memory mapped
> > into it? Sounds dangerous. What is it for?
>
> Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per PCIe
> spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d supports
> this with "Privileged-mode-Requested (PR) flag (to distinguish user versus
> supervisor access)". Each PASID entry has a SRE (Supervisor Request Enable)
> bit.

The PR flag is only needed if the underlying IOMMU is directly
processing the CPU page tables. For cases where the IOMMU is using its
own page table format and has its own copies the PR flag shouldn't be
used.

Jason