Re: [PATCH 2/4] spi: spi-fsl-dspi: Use non-coherent memory for DMA

From: Arnd Bergmann
Date: Thu Jun 12 2025 - 07:17:41 EST


On Thu, Jun 12, 2025, at 13:05, James Clark wrote:
> On 11/06/2025 10:01 am, Vladimir Oltean wrote:
>> On Tue, Jun 10, 2025 at 11:56:34AM -0400, Frank Li wrote:
>>> Can you add performance beneafit information after use non-coherent memory
>>> in commit message to let reviewer easily know your intention.
>>
>> To expand on that, you can post the output of something like this
>> (before and after):
>> $ spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 10000000
>> where /dev/spidev1.0 is an unconnected chip select with a dummy entry in
>> the device tree.
>
> Coherent (before):
>
> rate: tx 385.8kbps, rx 385.8kbps
> rate: tx 1215.7kbps, rx 1215.7kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1844.0kbps, rx 1844.0kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1846.9kbps, rx 1846.9kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1843.2kbps, rx 1843.2kbps
> rate: tx 1844.8kbps, rx 1844.8kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Non-coherent (after):
>
> rate: tx 314.6kbps, rx 314.6kbps
> rate: tx 748.3kbps, rx 748.3kbps
> rate: tx 1845.2kbps, rx 1845.2kbps
> rate: tx 1849.3kbps, rx 1849.3kbps
> rate: tx 1846.1kbps, rx 1846.1kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
> rate: tx 1844.4kbps, rx 1844.4kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1847.3kbps, rx 1847.3kbps
> rate: tx 1845.7kbps, rx 1845.7kbps
> rate: tx 1846.5kbps, rx 1846.5kbps
>
> Ignoring anything less than 1800 as starting up, coherent has an average
> of 1845.2kbps and non-coherent 1846.5kbps. Not sure if that's just noise
> or an actual effect.

The extra cache flushes do introduce some overhead as well, so I
would expect the noncoherent case to be slightly slower for
small transfers, but the coherent case to be faster for large
transfers.

"--size 256" presumably means 256 bytes, i.e. four cachelines?
If it's easy to reproduce, can you check with smaller sizes
that still use the DMA codepath (e.g. 64 bytes) and much larger
transfers (e.g. 2048 bytes)?

Arnd