Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

From: David Gibson
Date: Fri May 01 2015 - 00:54:13 EST


On Fri, May 01, 2015 at 10:46:08AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote:
> > On 04/30/2015 05:22 PM, David Gibson wrote:
> > > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> > >> At the moment only one group per container is supported.
> > >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> > >> IOMMU group so we can relax this limitation and support multiple groups
> > >> per container.
> > >
> > > It's not obvious why allowing multiple TCE tables per PE has any
> > > pearing on allowing multiple groups per container.
> >
> >
> > This patchset is a global TCE tables rework (patches 1..30, roughly) with 2
> > outcomes:
> > 1. reusing the same IOMMU table for multiple groups - patch 31;
> > 2. allowing dynamic create/remove of IOMMU tables - patch 32.
> >
> > I can remove this one from the patchset and post it separately later but
> > since 1..30 aim to support both 1) and 2), I'd think I better keep them all
> > together (might explain some of changes I do in 1..30).
>
> I think you are talking past each other :-)
>
> But yes, having 2 tables per group is orthogonal to the ability of
> having multiple groups per container.
>
> The latter is made possible on P8 in large part because each PE has its
> own DMA address space (unlike P5IOC2 or P7IOC where a single address
> space is segmented).
>
> Also, on P8 you can actually make the TVT entries point to the same
> table in memory, thus removing the need to duplicate the actual
> tables (though you still have to duplicate the invalidations). I would
> however recommend only sharing the table that way within a chip/node.
>
> .../..
>
> > >>
> > >> -1) Only one IOMMU group per container is supported as an IOMMU group
> > >> -represents the minimal entity which isolation can be guaranteed for and
> > >> -groups are allocated statically, one per a Partitionable Endpoint (PE)
> > >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> > >> +container is supported as an IOMMU table is allocated at the boot time,
> > >> +one table per a IOMMU group which is a Partitionable Endpoint (PE)
> > >> (PE is often a PCI domain but not always).
>
> > > I thought the more fundamental problem was that different PEs tended
> > > to use disjoint bus address ranges, so even by duplicating put_tce
> > > across PEs you couldn't have a common address space.
>
> Yes. This is the problem with P7IOC and earlier. It *could* be doable on
> P7IOC by making them the same PE but let's not go there.
>
> > Sorry, I am not following you here.
> >
> > By duplicating put_tce, I can have multiple IOMMU groups on the same
> > virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple
> > groups per container" does this, the address ranges will the same.
>
> But that is only possible on P8 because only there do we have separate
> address spaces between PEs.
>
> > What I cannot do on p5ioc2 is programming the same table to multiple
> > physical PHBs (or I could but it is very different than IODA2 and pretty
> > ugly and might not always be possible because I would have to allocate
> > these pages from some common pool and face problems like fragmentation).
>
> And P7IOC has a similar issue. The DMA address top bits indexes the
> window on P7IOC within a shared address space. It's possible to
> configure a TVT to cover multiple devices but with very serious
> limitations.

Ok. To check my understanding does this sound reasonable:

* The table_group more-or-less represents a PE, but in a way you can
reference without first knowing the specific IOMMU hardware type.

* When attaching multiple groups to the same container, the first PE
(i.e. table_group) attached is used as a representative so that
subsequent groups can be checked for compatibility with the first
PE and therefore all PEs currently included in the container

- This is why the table_group appears in some places where it
doesn't seem sensible from a pure object ownership point of
view

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: pgpdyaFOgk8Go.pgp
Description: PGP signature