[RFC PATCH v2 RESEND] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs

From: Uenal Mutlu
Date: Sun May 12 2019 - 17:14:55 EST


Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS, ie.
TX_TRANSACTION_SIZE and RX_TRANSACTION_SIZE) from default 0x0 each
to 0x3 each, gives a write performance boost of 120 MiB/s to 132 MiB/s
from lame 36 MiB/s to 45 MiB/s previously.
Read performance is about 200 MiB/s.
[tested on SSD using dd bs=2K/4K/8K/12K/16K/24K/32K: peak-perf at 12K].

Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).
These devices are RaspberryPi-like small devices.

This problem of slow SATA write-speed with these small devices lasts now
for more than 5 years. Many commentators throughout the years wrongly
assumed the slow write speed was a hardware limitation. This patch finally
solves the problem, which in fact was just a hard-to-fix software problem
(b/c of lack of documentation by the SoC-maker Allwinner Technology).

RFC: Since more than about 25 similar SBC/SoC models do use the
ahci_sunxi driver, users are encouraged to test it on all the
affected boards and give feedback.

Lists of the affected sunxi and other boards and SoCs with SATA using
the ahci_sunxi driver:
$ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts
and http://linux-sunxi.org/SATA#Devices_with_SATA_ports
See also http://linux-sunxi.org/Category:Devices_with_SATA_port

Patch v2:
- Commented the patch in-place in ahci_sunxi.c
- With bs=12K and no conv=... passed to dd, the write performance
rises further to 132 MiB/s
- Changed MB/s to MiB/s
- Posted the story behind the patch:
http://lkml.iu.edu/hypermail/linux/kernel/1905.1/03506.html
- Posted a dd test script to find optimal bs, and some results:
https://bit.ly/2YoOzEM

Patch v1:
- States bs=4K for dd and a write performance of 120 MiB/s

Signed-off-by: Uenal Mutlu <um@xxxxxxxxxxx>
---
drivers/ata/ahci_sunxi.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c
index 911710643305..ed19f19808c5 100644
--- a/drivers/ata/ahci_sunxi.c
+++ b/drivers/ata/ahci_sunxi.c
@@ -157,8 +157,51 @@ static void ahci_sunxi_start_engine(struct ata_port *ap)
void __iomem *port_mmio = ahci_port_base(ap);
struct ahci_host_priv *hpriv = ap->host->private_data;

- /* Setup DMA before DMA start */
- sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400);
+ /* Setup DMA before DMA start
+ *
+ * NOTE: A similar SoC with SATA/AHCI by Texas Instruments documents
+ * this Vendor Specific Port (P0DMACR, aka PxDMACR) in its
+ * User's Guide document (TMS320C674x/OMAP-L1x Processor
+ * Serial ATA (SATA) Controller, Literature Number: SPRUGJ8C,
+ * March 2011, Chapter 4.33 Port DMA Control Register (P0DMACR),
+ * p.68, https://www.ti.com/lit/ug/sprugj8c/sprugj8c.pdf)
+ * as equivalent to the following struct:
+ *
+ * struct AHCI_P0DMACR_t
+ * {
+ * unsigned TXTS : 4,
+ * RXTS : 4,
+ * TXABL : 4,
+ * RXABL : 4,
+ * Reserved : 16;
+ * };
+ *
+ * TXTS: Transmit Transaction Size (TX_TRANSACTION_SIZE).
+ * This field defines the DMA transaction size in DWORDs for
+ * transmit (system bus read, device write) operation. [...]
+ *
+ * RXTS: Receive Transaction Size (RX_TRANSACTION_SIZE).
+ * This field defines the Port DMA transaction size in DWORDs
+ * for receive (system bus write, device read) operation. [...]
+ *
+ * TXABL: Transmit Burst Limit.
+ * This field allows software to limit the VBUSP master read
+ * burst size. [...]
+ *
+ * RXABL: Receive Burst Limit.
+ * Allows software to limit the VBUSP master write burst
+ * size. [...]
+ *
+ * Reserved: Reserved.
+ *
+ *
+ * NOTE: According to the above document, the following alternative
+ * to the code below could perhaps be a better option
+ * (or preparation) for possible further improvements later:
+ * sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff,
+ * 0x00000033);
+ */
+ sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff, 0x00004433);

/* Start DMA */
sunxi_setbits(port_mmio + PORT_CMD, PORT_CMD_START);
--
2.11.0