Re: g_NCR5380 PDMA, was Re: [PATCH 0/6] ncr5380: Miscellaneous minor patches

From: Finn Thain
Date: Fri Feb 17 2017 - 17:36:19 EST



On Thu, 16 Feb 2017, Ondrej Zary wrote:

> On Tuesday 31 January 2017 02:31:45 Finn Thain wrote:
> [...]
> > Are you trying to figure out which commands are going to disconnect
> > during a transfer? This is really a function of the firmware in the
> > target; there are no good heuristics AFAICT, so the PDMA algorithm has
> > to be robust. mac_scsi has to cope with this too.
> >
> > Does the problem go away when you assign no IRQ? When instance->irq ==
> > NO_IRQ, the core driver will inhibit disconnect privileges.
>
> Yes, it seems to run fine with IRQ/disconnect disabled. With IRQ
> enabled, "dd if=/dev/sr0 of=anything" stops after a while.

When you say "stops", do you mean an infinite loop in
generic_NCR5380_pread or does the loop complete (which would presumably
leave the target stuck in DATA IN phase, and a scsi command timeout would
probably follow after 30 seconds...)

> I get gated 53C80 IRQ, BASR=0x10, MODE=0x0e, STATUS=0x7c.
>

You can use NCR5380_print() to get a decoded register dump.

When I decode the above values I get,

BASR = 0x10 = BASR_IRQ
MODE = 0x0e = MR_ENABLE_EOP_INTR | MR_MONITOR_BSY | MR_DMA_MODE
STATUS = 0x7c = SR_BSY | SR_REQ | SR_MSG | SR_CD | SR_IO

Since BASR_PHASE_MATCH is not set, the interrupt is almost certainly a
phase mismatch. The new phase is SR_MSG | SR_CD | SR_IO = PHASE_MSGIN,
which shows that the target has given up on the DATA IN phase and is
presumably trying to send a DISCONNECT message.

> When I enable interrupts during PDMA (like the removed UNSAFE macro
> did), the problems go away. I see an IRQ after each pread call.

You shouldn't need an interrupt, because NCR5380_dma_complete() gets
called regardless. AFAICT, the only difference is the timing, which
becomes less predictable. Calling spinlock_irq_restore() before the
transfer seems to create a race condition between hostdata->dma_len store
and load.

I think that the pread call has not yet returned when the interrupt fires
and NCR5380_dma_complete() is called. Hence hostdata->dma_len has not yet
been initialized. So when NCR5380_dma_complete() is called by the
interrupt handler, SCp.this_residual will not change at all because
hostdata->dma_len is still zero.

> (had to disable "no 53C80 gated irq after transfer" and "no end dma
> signal" messages to reduce log spam)
>

That may provide a quick way to detect the failed PDMA transfer, at least
for testing purposes. There may be a more conclusive test for a partial
transfer.

We could to implement something like macscsi_dma_residual() to take care
of a failed PDMA transfer. That way, the failure can be taken into account
when NCR5380_dma_complete() is called at the end of the transfer.

See patch below for example. It should confirm the theory above though it
really needs some timeouts added (NCR5380_poll_politely()). And perhaps we
could do something more clever than retry indefinitely, though we can
still rely on the command timeout.

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 6f9665d..75cfaf3 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -497,15 +497,17 @@ static inline int generic_NCR5380_pread(struct NCR5380_hostdata *hostdata,
blocks--;
}

+ hostdata->pdma_residual = 0;
+
if (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_GATED_53C80_IRQ))
- printk("53C400r: no 53C80 gated irq after transfer");
+ hostdata->pdma_residual = hostdata->dma_len;

/* wait for 53C80 registers to be available */
while (!(NCR5380_read(hostdata->c400_ctl_status) & CSR_53C80_REG))
;

if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER))
- printk(KERN_ERR "53C400r: no end dma signal\n");
+ hostdata->pdma_residual = hostdata->dma_len;

return 0;
}
@@ -576,12 +578,14 @@ static inline int generic_NCR5380_pwrite(struct NCR5380_hostdata *hostdata,
/* FIXME - no timeout */
}

- if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER)) {
- printk(KERN_ERR "53C400w: no end dma signal\n");
- }
+ hostdata->pdma_residual = 0;
+
+ if (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_END_DMA_TRANSFER))
+ hostdata->pdma_residual = hostdata->dma_len;

while (!(NCR5380_read(TARGET_COMMAND_REG) & TCR_LAST_BYTE_SENT))
; // TIMEOUT
+
return 0;
}

@@ -607,6 +611,11 @@ static int generic_NCR5380_dma_xfer_len(struct NCR5380_hostdata *hostdata,
return transfersize;
}

+static int generic_NCR5380_dma_residual(struct NCR5380_hostdata *hostdata)
+{
+ return hostdata->pdma_residual;
+}
+
/*
* Include the NCR5380 core code that we build our driver around
*/
diff --git a/drivers/scsi/g_NCR5380.h b/drivers/scsi/g_NCR5380.h
index 81b22d9..08c91b7 100644
--- a/drivers/scsi/g_NCR5380.h
+++ b/drivers/scsi/g_NCR5380.h
@@ -26,7 +26,8 @@
int c400_ctl_status; \
int c400_blk_cnt; \
int c400_host_buf; \
- int io_width;
+ int io_width; \
+ int pdma_residual;

#define NCR53C400_mem_base 0x3880
#define NCR53C400_host_buffer 0x3900
@@ -35,7 +36,7 @@
#define NCR5380_dma_xfer_len generic_NCR5380_dma_xfer_len
#define NCR5380_dma_recv_setup generic_NCR5380_pread
#define NCR5380_dma_send_setup generic_NCR5380_pwrite
-#define NCR5380_dma_residual NCR5380_dma_residual_none
+#define NCR5380_dma_residual generic_NCR5380_dma_residual

#define NCR5380_intr generic_NCR5380_intr
#define NCR5380_queue_command generic_NCR5380_queue_command

--