Re: [PATCH v6 6/6] iommu/dma: Make flush queue sizes and timeout driver configurable

From: Niklas Schnelle
Date: Fri Feb 17 2023 - 10:41:22 EST


On Fri, 2023-02-17 at 09:41 -0500, Matthew Rosato wrote:
> On 2/15/23 7:03 AM, Niklas Schnelle wrote:
> > Flush queues currently use a fixed compile time size of 256 entries.
> > This being a power of 2 allows the compiler to use shift and mask
> > instead of more expensive modulo operations. With per-CPU flush queues
> > larger queue sizes would hit per-CPU allocation limits, with a single
> > flush queue these limits do not apply however. Also with single queues
> > being particularly suitable for virtualized environments with expensive
> > IOTLB flushes these benefit especially from larger queues and thus fewer
> > flushes.
> >
> > To this end re-order struct iova_fq so we can use a dynamic array and
> > introduce the flush queue size and timeouts as new options in the
> > dma_iommu_options struct. So as not to lose the shift and mask
> > optimization, check that the variable length is a power of 2 and use
> > explicit shift and mask instead of letting the compiler optimize this.
> >
> > In the s390 IOMMU driver a large fixed queue size and timeout is then
> > set together with single queue mode bringing its performance on s390
> > paged memory guests on par with the previous s390 specific DMA API
> > implementation.
> >
> > Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
>
> Reviewed-by: Matthew Rosato <mjrosato@xxxxxxxxxxxxx> #s390
>
> > +#define S390_IOMMU_SINGLE_FQ_SIZE 32768
> > +#define S390_IOMMU_SINGLE_FQ_TIMEOUT 1000
> > +
>
> One question about these values however, was there a rationale to choosing these particular numbers (anything worth documenting?) or were they were simply chosen because they showed similar characteristics to the previous DMA approach? I'm mostly wondering if it's worth experimenting with other values here in the future to see what kind of impact it would have.
>

For the flush queue size, which has to be a power of two, I basically
picked the smallest value that gave me similar performance under z/VM
as prior to the conversion. Somewhat similar for the timeout though I
tried only a dozen "nice" values. Since the timeout basically
determines how long a rogue PCI device could potentially access old
data I found that 1 second is also kind of easy to explain. But yes
these could still be tuned in the future.

Thanks,
Niklas