Re: [PATCH v5 5/7] iommu/dma: Allow a single FQ in addition to per-CPU FQs

From: Niklas Schnelle
Date: Mon Jan 30 2023 - 10:37:18 EST


On Mon, 2023-01-30 at 15:13 +0000, Robin Murphy wrote:
> On 2023-01-24 12:50, Niklas Schnelle wrote:
> > In some virtualized environments, including s390 paged memory guests,
> > IOTLB flushes are used to update IOMMU shadow tables. Due to this, they
> > are much more expensive than in typical bare metal environments or
> > non-paged s390 guests. In addition they may parallelize more poorly in
> > virtualized environments. This changes the trade off for flushing IOVAs
> > such that minimizing the number of IOTLB flushes trumps any benefit of
> > cheaper queuing operations or increased paralellism.
> >
> > In this scenario per-CPU flush queues pose several problems. Firstly
> > per-CPU memory is often quite limited prohibiting larger queues.
> > Secondly collecting IOVAs per-CPU but flushing via a global timeout
> > reduces the number of IOVAs flushed for each timeout especially on s390
> > where PCI interrupts may not be bound to a specific CPU.
> >
> > Thus let's introduce a single flush queue mode IOMMU_DOMAIN_DMA_SQ that
> > reuses the same queue logic but only allocates a single global queue
> > allowing larger batches of IOVAs to be freed at once and with larger
> > timeouts. This is to allow the common IOVA flushing code to more closely
> > resemble the global flush behavior used on s390's previous internal DMA
> > API implementation.
> >
> > As we now support two different variants of flush queues rename the
> > existing __IOMMU_DOMAIN_DMA_FQ to __IOMMU_DOMAIN_DMA_LAZY to indicate
> > the general case of having a flush queue and introduce separate
> > __IOMMU_DOMAIN_DMA_PERCPU_Q and __IOMMU_DOMAIN_DMA_SINGLE_Q bits to
> > indicate the two queue variants.
>
> Is there any actual need for the flush queue type to vary on a
> per-domain basis? All the descriptions here seem to imply that in fact
> it's always going to be a global decision one way or the other on s390,
> so if that's all we really need, we can save ourselves a bunch of
> trouble here by not having to mess with the core code at all, and just
> having some kind of switch in iommu-dma.
>
> Either way, the more I think about this the more I'm starting to agree
> that adding more domain types for iommu-dma policy is a step in the
> wrong direction. If I may, I'd like to fall back on the "or at least
> some definite internal flag" part of my original suggestion :)
>
> Thanks,
> Robin.
>
>

Our PCI architecture has the relevant flags per PCI device so in theory
one device could use expensive shadow tables updated via IOTLB flushes
while another might use nested IO page table walks. This isn't the case
for any existing hardware though, there it's indeed all or nothing so
at least for now that would be enough. Also that just makes it per
device not necessarily per domain or creates a need to switch while in
use. I'll discuss where such a flag could go in an answer to Jason's
mail.