Re: [PATCH v4 0/6] spi: spi-fsl-dspi: Target mode improvements
From: Vladimir Oltean
Date: Tue Jul 01 2025 - 10:03:17 EST
On Tue, Jul 01, 2025 at 01:42:46PM +0100, James Clark wrote:
> I wonder if latency could be higher despite increased throughput? It
> probably wouldn't be a big enough increase that anyone would care. And based
> on the structure of the driver if throughput is higher the latency might
> even be lower.
Actually, I do have a metric for that, sort of. I have a SPI-controlled
Ethernet switch with support for IEEE 1588, and synchronizing its
hardware clock over SPI benefits greatly from having a high precision
software timestamping point for the SPI transfers themselves.
Essentially, with XSPI FIFO mode we are able to provide a timestamping
granularity of $(FIFO size) words, see the spi_take_timestamp_pre() and
spi_take_timestamp_post() calls. Whereas with DMA, we let the core take
a message-level software timestamp which is much coarser, because at
driver level we can't guarantee a much more precise transmission time
interval for a particular requested byte. See __spi_pump_transfer_message().
If you're not familiar with phc2sys, an interpretation of the logs below
is as follows.
phc2sys synchronizes the sw2p0 (target) clock to CLOCK_REALTIME (the
source clock). "delay" is the time it took for the kernel to read the
target clock once, and the system clock twice (before and after).
When software timestamps the SPI transfer that reads the hardware time,
this is called a "cross timestamp". The smaller and less jittery this
delay, the more stable the cross-timestamp and the better will software
be able to discipline the target clock (aka the smaller the offsets will
be).
Before:
$ phc2sys -s CLOCK_REALTIME -c sw2p0 -O 0 -m
phc2sys[38.432]: sw2p0 sys offset -1741272972548124929 s0 freq +0 delay 6720
phc2sys[39.434]: sw2p0 sys offset -1741272972548179141 s1 freq -54094 delay 5960
phc2sys[40.436]: sw2p0 sys offset 190 s2 freq -53904 delay 6001
phc2sys[41.437]: sw2p0 sys offset 306 s2 freq -53731 delay 6520
phc2sys[42.438]: sw2p0 sys offset 275 s2 freq -53670 delay 6401
phc2sys[43.441]: sw2p0 sys offset 350 s2 freq -53513 delay 6881
phc2sys[44.442]: sw2p0 sys offset -302 s2 freq -54060 delay 6321
phc2sys[45.444]: sw2p0 sys offset 35 s2 freq -53814 delay 6761
phc2sys[46.446]: sw2p0 sys offset -103 s2 freq -53941 delay 6481
phc2sys[47.447]: sw2p0 sys offset -43 s2 freq -53912 delay 6361
phc2sys[48.450]: sw2p0 sys offset 314 s2 freq -53568 delay 6960
phc2sys[49.451]: sw2p0 sys offset -310 s2 freq -54098 delay 6441
phc2sys[50.453]: sw2p0 sys offset -86 s2 freq -53967 delay 6321
phc2sys[51.455]: sw2p0 sys offset -5 s2 freq -53911 delay 6401
phc2sys[52.457]: sw2p0 sys offset -2 s2 freq -53910 delay 6320
phc2sys[53.458]: sw2p0 sys offset 77 s2 freq -53832 delay 6400
phc2sys[54.459]: sw2p0 sys offset -112 s2 freq -53997 delay 6240
phc2sys[55.461]: sw2p0 sys offset 66 s2 freq -53853 delay 6480
phc2sys[56.463]: sw2p0 sys offset -33 s2 freq -53932 delay 6441
phc2sys[57.465]: sw2p0 sys offset -33 s2 freq -53942 delay 6441
phc2sys[58.467]: sw2p0 sys offset 17 s2 freq -53902 delay 6440
phc2sys[59.468]: sw2p0 sys offset -14 s2 freq -53928 delay 6520
phc2sys[60.470]: sw2p0 sys offset -133 s2 freq -54051 delay 6281
phc2sys[61.472]: sw2p0 sys offset 8 s2 freq -53950 delay 6400
phc2sys[62.473]: sw2p0 sys offset 25 s2 freq -53931 delay 6400
phc2sys[63.474]: sw2p0 sys offset -113 s2 freq -54061 delay 6040
phc2sys[64.476]: sw2p0 sys offset 44 s2 freq -53938 delay 6281
phc2sys[65.477]: sw2p0 sys offset -17 s2 freq -53986 delay 6320
phc2sys[66.479]: sw2p0 sys offset -86 s2 freq -54060 delay 5841
phc2sys[67.480]: sw2p0 sys offset 141 s2 freq -53859 delay 6361
phc2sys[68.481]: sw2p0 sys offset -11 s2 freq -53968 delay 6320
phc2sys[69.483]: sw2p0 sys offset -15 s2 freq -53976 delay 6321
phc2sys[70.484]: sw2p0 sys offset -109 s2 freq -54074 delay 5960
phc2sys[71.486]: sw2p0 sys offset 115 s2 freq -53883 delay 6520
phc2sys[72.488]: sw2p0 sys offset -86 s2 freq -54049 delay 6280
phc2sys[73.489]: sw2p0 sys offset 234 s2 freq -53755 delay 6801
phc2sys[74.491]: sw2p0 sys offset -219 s2 freq -54138 delay 6361
^Cphc2sys[74.923]: sw2p0 sys offset -174 s2 freq -54159 delay 6360
After:
$ phc2sys -s CLOCK_REALTIME -c sw2p0 -O 0 -m
phc2sys[753.479]: sw2p0 sys offset 1882248595 s0 freq +32000000 delay 150440
phc2sys[754.482]: sw2p0 sys offset 1850232103 s1 freq +46787 delay 141960
phc2sys[755.483]: sw2p0 sys offset -33278 s2 freq +13509 delay 143160
phc2sys[756.485]: sw2p0 sys offset -5074 s2 freq +31729 delay 150040
phc2sys[757.486]: sw2p0 sys offset 11060 s2 freq +46341 delay 140240
phc2sys[758.488]: sw2p0 sys offset 4804 s2 freq +43403 delay 151320
phc2sys[759.489]: sw2p0 sys offset 10358 s2 freq +50398 delay 141879
phc2sys[760.491]: sw2p0 sys offset 409 s2 freq +43557 delay 148840
phc2sys[761.493]: sw2p0 sys offset 3863 s2 freq +47133 delay 143360
phc2sys[762.494]: sw2p0 sys offset 259 s2 freq +44688 delay 145840
phc2sys[763.496]: sw2p0 sys offset 1849 s2 freq +46356 delay 141000
phc2sys[764.497]: sw2p0 sys offset -1800 s2 freq +43262 delay 144160
phc2sys[765.499]: sw2p0 sys offset -184 s2 freq +44338 delay 139880
phc2sys[766.501]: sw2p0 sys offset -1677 s2 freq +42790 delay 146120
phc2sys[767.502]: sw2p0 sys offset 2529 s2 freq +46492 delay 141040
phc2sys[768.504]: sw2p0 sys offset -4368 s2 freq +40354 delay 151240
phc2sys[769.505]: sw2p0 sys offset 1112 s2 freq +44524 delay 147680
phc2sys[770.507]: sw2p0 sys offset 3002 s2 freq +46747 delay 142960
phc2sys[771.509]: sw2p0 sys offset -899 s2 freq +43747 delay 145440
phc2sys[772.510]: sw2p0 sys offset -2003 s2 freq +42373 delay 148360
phc2sys[773.512]: sw2p0 sys offset 3675 s2 freq +47450 delay 141440
phc2sys[774.514]: sw2p0 sys offset -1417 s2 freq +43461 delay 144960
phc2sys[775.515]: sw2p0 sys offset 802 s2 freq +45255 delay 142559
phc2sys[776.517]: sw2p0 sys offset 1368 s2 freq +46061 delay 140040
phc2sys[777.518]: sw2p0 sys offset -1897 s2 freq +43207 delay 141840
phc2sys[778.520]: sw2p0 sys offset -774 s2 freq +43761 delay 141680
phc2sys[779.522]: sw2p0 sys offset -1715 s2 freq +42587 delay 145199
phc2sys[780.523]: sw2p0 sys offset 4045 s2 freq +47833 delay 134839
phc2sys[781.525]: sw2p0 sys offset -4809 s2 freq +40192 delay 146840
phc2sys[782.526]: sw2p0 sys offset 363 s2 freq +43922 delay 144759
phc2sys[783.528]: sw2p0 sys offset 3328 s2 freq +46996 delay 140240
phc2sys[784.530]: sw2p0 sys offset -293 s2 freq +44373 delay 142480
phc2sys[785.531]: sw2p0 sys offset 46 s2 freq +44624 delay 142000
phc2sys[786.533]: sw2p0 sys offset -3422 s2 freq +41170 delay 148080
phc2sys[787.534]: sw2p0 sys offset 2932 s2 freq +46497 delay 140720
phc2sys[788.536]: sw2p0 sys offset -1961 s2 freq +42484 delay 147040
phc2sys[789.537]: sw2p0 sys offset -945 s2 freq +42912 delay 149160
phc2sys[790.539]: sw2p0 sys offset 3221 s2 freq +46794 delay 143040
phc2sys[791.541]: sw2p0 sys offset 41 s2 freq +44580 delay 144160
phc2sys[792.542]: sw2p0 sys offset -748 s2 freq +43804 delay 145120
Here, the synchronization offsets in DMA mode are an order of magnitude
worse, so yeah, initial enthusiasm definitely curbed now.
For me, what matters is not the latency itself, but the ability to
cross-timestamp one byte within the SPI transfer with high granularity,
and for the uncertainty of that timestamp to be as small and constant as
possible.
For that reason, I can post a third output log, taken in XSPI FIFO mode
but with "ctlr->ptp_sts_supported = true" removed. That causes the core
to take message-level software timestamps, which are a better indicator
of latency.
You can see that in FIFO mode, the minimum is much smaller (108 us) but
the spread is larger (the maximum is 209 us). In DMA mode, the latencies
are much more stable. But despite this, XSPI is still better for the
ability to zoom in on the particular byte of interest.
$ phc2sys -s CLOCK_REALTIME -c sw2p0 -O 0 -m
phc2sys[246.568]: sw2p0 sys offset 2872475 s0 freq -88840 delay 131332
phc2sys[247.571]: sw2p0 sys offset 2874267 s1 freq -87052 delay 194739
phc2sys[248.572]: sw2p0 sys offset 71966 s2 freq -15086 delay 114971
phc2sys[249.573]: sw2p0 sys offset 34792 s2 freq -30670 delay 108331
phc2sys[250.575]: sw2p0 sys offset -39553 s2 freq -94578 delay 208580
phc2sys[251.577]: sw2p0 sys offset 50369 s2 freq -16521 delay 107410
phc2sys[252.578]: sw2p0 sys offset 1597 s2 freq -50183 delay 128292
phc2sys[253.579]: sw2p0 sys offset 6685 s2 freq -44616 delay 107810
phc2sys[254.581]: sw2p0 sys offset -4102 s2 freq -53397 delay 108530
phc2sys[255.582]: sw2p0 sys offset -7256 s2 freq -57782 delay 112051
phc2sys[256.584]: sw2p0 sys offset -2910 s2 freq -55613 delay 108610
phc2sys[257.586]: sw2p0 sys offset -52981 s2 freq -106557 delay 209460
phc2sys[258.587]: sw2p0 sys offset 49914 s2 freq -19556 delay 107130
phc2sys[259.589]: sw2p0 sys offset -29913 s2 freq -84409 delay 195699
phc2sys[260.591]: sw2p0 sys offset 42439 s2 freq -21031 delay 110411
phc2sys[261.592]: sw2p0 sys offset 3048 s2 freq -47690 delay 120571
phc2sys[262.594]: sw2p0 sys offset -853 s2 freq -50676 delay 113291
phc2sys[263.596]: sw2p0 sys offset -35260 s2 freq -85339 delay 173937
phc2sys[264.597]: sw2p0 sys offset 26479 s2 freq -34178 delay 110570
phc2sys[265.599]: sw2p0 sys offset -36802 s2 freq -89516 delay 195699
phc2sys[266.601]: sw2p0 sys offset 39945 s2 freq -23809 delay 110571
phc2sys[267.603]: sw2p0 sys offset -32036 s2 freq -83807 delay 194858
phc2sys[268.604]: sw2p0 sys offset 37721 s2 freq -23661 delay 110570
phc2sys[269.606]: sw2p0 sys offset 5110 s2 freq -44955 delay 112571
phc2sys[270.607]: sw2p0 sys offset -3526 s2 freq -52058 delay 109570
phc2sys[271.608]: sw2p0 sys offset -7856 s2 freq -57446 delay 112491
phc2sys[272.610]: sw2p0 sys offset -5259 s2 freq -57206 delay 112051
phc2sys[273.612]: sw2p0 sys offset -43272 s2 freq -96797 delay 194178
phc2sys[274.613]: sw2p0 sys offset 40708 s2 freq -25798 delay 108291
phc2sys[275.615]: sw2p0 sys offset -38753 s2 freq -93047 delay 208900
phc2sys[276.616]: sw2p0 sys offset 47948 s2 freq -17972 delay 111050
phc2sys[277.618]: sw2p0 sys offset 10692 s2 freq -40843 delay 111131
phc2sys[278.619]: sw2p0 sys offset -2179 s2 freq -50507 delay 108530
phc2sys[279.620]: sw2p0 sys offset -8143 s2 freq -57124 delay 111571
phc2sys[280.623]: sw2p0 sys offset -49486 s2 freq -100910 delay 199179
phc2sys[281.625]: sw2p0 sys offset -3684 s2 freq -69954 delay 199419
phc2sys[282.626]: sw2p0 sys offset 54475 s2 freq -12900 delay 111651
phc2sys[283.628]: sw2p0 sys offset -36562 s2 freq -87595 delay 209420
^Cphc2sys[284.181]: sw2p0 sys offset -11239 s2 freq -73240 delay 194499