Re: [PATCH v1 3/7] spi: qspi-tegra: Add support for Tegra210 QSPI controller

From: Sowjanya Komatineni
Date: Fri Dec 04 2020 - 16:05:57 EST



On 12/4/20 10:52 AM, Mark Brown wrote:
On Thu, Dec 03, 2020 at 04:22:54PM -0800, Sowjanya Komatineni wrote:
On 12/2/20 11:17 AM, Sowjanya Komatineni wrote:
It seems weird that this device needs us to do a memcpy() to do DMA,
most devices are able to DMA directly from the buffers provided by the
SPI API (and let the SPI core sync things).  What is going on here?
For transfers of size more than max DMA transfer limit, data transfer
happens in multiple iterations with each iteration transferring up to
max DMA transfer limit.
So using separate dma buffers and on every iteration copying them to SPI
core provided tx/rx buffers.
I don't understand this - there's no restriction on where DMA transfers
can be done from within a DMA mapped region, the driver can do multiple
transfers from different chunks of the source buffer without having to
copy anything. That's a very common scenario.

Also unpack mode needs to manually put the bytes together from read data to
SPI core rx buffer.
Could you be more explicit here, I don't know what "unpack mode" is?

Tegra SPI/QSPI controller support packed mode and unpacked mode based on bits per word in a transfer.

Packed Mode: When enabled, all 32-bits of data in FIFO contains valid data packets of 8-bit/16-bit/32-bit length.

Non packed mode: For transfers like 24-bit data for example we disable packed mode and only 24-bits of FIFO data are valid and other bits are 0's.
So during TX for FIFO filling and during receive when FIFO data is read, SW need to skip invalid bits and should align order from/to SPI core tx/rx buffers.


This is worrying, the client device might be confused if /CS is doing
things outside of the standard handling.
Do you mean to honor spi_transfer cs_change flag?
At least, yes - more generally just if there's any feature to with chip
select then the driver will need to open code it. The driver should at
least be double checking that what it's doing matches what it was told
to do, though just letting this be handled by the generic code if
there's no limitation on the hardware tends to be easier all round.
OK Will honor cs_change flag at end of transfer and will program CS state based on that.

Tegra QSPI is master and is used only with QSPI flash devices. Looking
at SPI NOR driver, I see QSPI Flash commands are executed with one flash
command per spi_message and I dont see cs_change flag usage w.r.t QSPI
flash. So, using SW based CS control for QSPI.
Please correct me if I miss something to understand here.
Someone might build a system that does something different, they may see
a spare SPI controller and decide they can use it for something else or
there may be some future change with the flash code which does something
different.

+    tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
The setup for one device shouldn't be able to affect the operation of
another, already running, device so either these need to be configured
as part of the controller probe or these configurations need to be
deferred until we're actually doing a transfer.
We will only have 1 device on QSPI as we only support single chip select.
It's quite common for people to do things like add additional devices
with GPIO chip selects.
Will move tap delay programming to happen during spi transfer..

+    /*
+     * Tegra QSPI hardware support dummy bytes transfer based on the
+     * programmed dummy clock cyles in QSPI register.
+     * So, get the total dummy bytes from the dummy bytes transfer in
+     * spi_messages and convert to dummy clock cyles.
+     */
+    list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+        if (ntransfers == DUMMY_BYTES_XFER &&
+            !(list_is_last(&xfer->transfer_list, &msg->transfers)))
+            dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
+        ntransfers++;
+    }
This seems weird, there's some hard coded assumption about particular
patterns that the client device is going to send.  What's going on here?
I don't really understand what this is trying to do.
QSPI flash needs dummy cycles for data read operation which is actually
the initial read latency and no. of dummy cycles required are vendor
specific.
SPI NOR driver gets required dummy cycles based on mode clock cycles and
wait state clock cycles.
During read operations, spi_nor_spimem_read_data() converts dummy cycles
to number of dummy bytes.
Tegra QSPI controller supports dummy clock cycles register and when
programmed QSPI controller sends dummy bytes rather than SW handling
extra cycles for transferring dummy bytes.
Above equation converts this dummy bytes back to dummy clock cycles to
program into QSPI register and avoid manual SW transfer of dummy bytes.
This is not a good idea, attempting to reverse engineer the message and
guess at the contents isn't going to be robust and if it's useful it
will if nothing else lead to a bunch of duplicated code in drivers as
every device that has this feature will need to reimplment it. Instead
we should extend the framework so there's explicit support for
specifying transfers that are padding bytes, then there's no guesswork
that can go wrong and no duplicated code between drivers. A flag in the
transfer struct might work?

As per QSPI spec, Dummy bytes for initial read latency are always FF's. So its not like guessing the contents.

Tegra QSPI controller HW supports transferring dummy bytes (sending FF's after address) based on dummy clock cycles programmed.

To allow Tegra QSPI HW transfer dummy bytes directly, controller driver need number of dummy bytes / actual dummy clock cycles which core driver gets from flash sfdp read data.

So, we can add flag to transfer and based on this flag if controller HW supports then we can ignore filling dummy bytes in spi_mem_exec_op but controller driver still needs dummy_cycles value. So probably along with flag, do you agree to have dummy_cycle as part of transfer struct which can be set to nor->read_dummy value?


Thanks

Sowjanya