Re: [PATCH 0/9] iommu: Refactor flush queues into iommu-dma

From: John Garry
Date: Wed Nov 24 2021 - 12:21:57 EST


On 23/11/2021 14:10, Robin Murphy wrote:
As promised, this series cleans up the flush queue code and streamlines
it directly into iommu-dma. Since we no longer have per-driver DMA ops
implementations, a lot of the abstraction is now no longer necessary, so
there's a nice degree of simplification in the process. Un-abstracting
the queued page freeing mechanism is also the perfect opportunity to
revise which struct page fields we use so we can be better-behaved
from the MM point of view, thanks to Matthew.

These changes should also make it viable to start using the gather
freelist in io-pgtable-arm, and eliminate some more synchronous
invalidations from the normal flow there, but that is proving to need a
bit more careful thought than I have time for in this cycle, so I've
parked that again for now and will revisit it in the new year.

For convenience, branch at:
https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/iova

I've build-tested for x86_64, and boot-tested arm64 to the point of
confirming that put_pages_list() gets passed a valid empty list when
flushing, while everything else still works.
My interest is in patches 2, 3, 7, 8, 9, and they look ok. I did a bit of testing for strict and non-strict mode on my arm64 system and no problems.

Apart from this, I noticed that one possible optimization could be to avoid so many reads of fq_flush_finish_cnt, as we seem to have a pattern of fq_flush_iotlb()->atomic64_inc(fq_flush_finish_cnt) followed by a read of fq_flush_finish_cnt in fq_ring_free(), so we could use atomic64_inc_return(fq_flush_finish_cnt) and reuse the value. I think that any racing in fq_flush_finish_cnt accesses are latent, but maybe there is a flaw in this. However I tried something along these lines and got a 2.4% throughput gain for my storage scenario.

Thanks,
John