Re: [PATCH v4 2/6] spi: spi-fsl-dspi: Store status directly in cur_msg->status

From: James Clark
Date: Tue Jul 01 2025 - 06:06:15 EST




On 30/06/2025 9:41 pm, Vladimir Oltean wrote:
On Mon, Jun 30, 2025 at 01:54:11PM +0100, James Clark wrote:
On 27/06/2025 10:30 pm, Vladimir Oltean wrote:
On Fri, Jun 27, 2025 at 11:21:38AM +0100, James Clark wrote:
This will allow us to return a status from the interrupt handler in a
later commit and avoids copying it at the end of
dspi_transfer_one_message(). For consistency make polling and DMA modes
use the same mechanism.

Refactor dspi_rxtx() and dspi_poll() to not return -EINPROGRESS because
this isn't actually a status that was ever returned to the core layer
but some internal state. Wherever that was used we can look at dspi->len
instead.

No functional changes intended.

Signed-off-by: James Clark <james.clark@xxxxxxxxxx>
---

This commit doesn't work, please do not merge this patch.

You are changing the logic in DMA mode, interrupt-based FIFO and PIO all
in one go, in a commit whose title and primary purpose is unrelated to
that. Just a mention of the type "while at it, also do that". And in
that process, that bundled refactoring introduces a subtle, but severe bug.

No, that is discouraged. Make one patch per logical change, where only
one thing is happening and which is obviously correct. It helps you and
it helps the reviewer.

Please find attached a set of 3 patches that represent a broken down and
corrected variant of this one. First 2 should be squashed together in
your next submission, they are just to illustrate the bug that you've
introduced (which can be reproduced on any SoC in XSPI mode).


Thanks for the debugging, yes it looks like the patches could be broken down
a bit.

Just for clarity, is this bug affecting host+polling mode? I can see the
logic bug in dspi_poll() which I must have tested less thoroughly, but I
can't actually see any difference in dspi_interrupt().

It should affect both, I tested your patches unmodified, i.e. interrupt
based XSPI FIFO mode (in master mode).

Assume (not real numbers, just for explanation's sake) dspi->len is 2
(2 FIFO sizes worth of 32-bit words, but let's assume for simplicity
that each dspi_pop_tx() call simply decrements the len by 1).

The correct behavior would be this:

dspi_transfer_one_message()
-> dspi->len = 2
-> dspi_fifo_write()
-> dspi_xspi_fifo_write()
-> dspi_pop_tx()
-> dspi->len = 1
-> wait_for_completion(&dspi->xfer_done)
<IRQ>
dspi_interrupt()
-> dspi_rxtx()
-> dspi_fifo_read()
-> dspi_fifo_write()
-> dspi_xspi_fifo_write()
-> dspi_pop_tx()
-> dspi->len = 0
<IRQ>
dspi_interrupt()
-> dspi_rxtx()
-> dspi_fifo_read()
-> complete(&dspi->xfer_done)
-> reinit_completion(&dspi->xfer_done)

but the behavior with your proposed logic is this:

dspi_transfer_one_message()
-> dspi->len = 2
-> dspi_fifo_write()
-> dspi_xspi_fifo_write()
-> dspi_pop_tx()
-> dspi->len = 1
-> wait_for_completion(&dspi->xfer_done)
<IRQ>
dspi_interrupt()
-> dspi_rxtx()
-> dspi_fifo_read()
-> dspi_fifo_write()
-> dspi_xspi_fifo_write()
-> dspi_pop_tx()
-> dspi->len = 0
-> complete(&dspi->xfer_done)
-> reinit_completion(&dspi->xfer_done)
<IRQ>
dspi_interrupt()
-> Second interrupt is spurious at
this point, since the process
context may have proceeded
to change pointers in
dspi->cur_transfer, etc.

Clearer now? Essentially the complete() call is premature, it needs to
be not after the dspi_fifo_write() call, but after its subsequent
dspi_fifo_read(), which comes after yet another IRQ, in the IRQ-triggered
path.


Much clearer, thanks. Not sure how I missed that, maybe a confusion about whether it was dspi_fifo_read() or dspi_fifo_write() that modifies dspi->len.

Not sure why you are not able to reproduce this, maybe luck had it that
the complete() call never woke up the process context earlier than the
second IRQ in the above case triggered?

I'm not doing anything special in particular, just booted a board with a
SPI device driver (sja1105). This transfers some sequences of relatively
large buffers (256 bytes) at probe time, maybe that exercises the
controller driver more than the average peripheral driver.

It's strange because I was stressing it quite a lot, especially with the performance testing.