Re: SCSI timeouts with Advansys driver under Linux

Leonard N. Zubkoff (lnz@dandelion.com)
Wed, 4 Dec 1996 23:54:20 -0800


Date: Wed, 4 Dec 96 17:54 PST
From: bobf@unix.advansys.com (Robert Frey)

> scsi0 channel 0 : resetting for second half of retries.
> SCSI bus is being reset for host 0 channel 0.
> advansys: advansys_reset: timeout serial number changed for request 256808

I have never seen an abort request return "timeout serial number...".
The way I've written the driver I assume different serial numbers in
an abort request to mean that the command was completed after the abort
was started. The abort request is ignored and I don't do a 'done' on the
command.

> At this point, the "cat" process which was reading from the cd drive
> blocks and becomes unkillable. Any process which calls sync() blocks
> forever and becomes unkillable.
Probably because no 'done' was ever done on the aborted command.

Perhaps Leonard can shed some light on how the serial number mismatch
occurred and comment on whether my handling of it is correct.

I took a look over your driver and there is a missing component. Aborts are
always asynchronous. Resets can be either synchronous (as in the case above)
or asynchronous. In the synchronous case, the reset is occurring for a comamnd
that is no longer active; the command completed with an error which the
mid-level has decided might work after a reset. It's only the asynchronous
case (i.e. a timeout occurs) where we have to be careful that the command has
not already completed.

>From BusLogic.c:

/*
If this is an Asynchronous Reset and this Command has already completed,
then no Reset is necessary.
*/
if (ResetFlags & SCSI_RESET_ASYNCHRONOUS)
{
TargetID = Command->target;
if (Command->serial_number != Command->serial_number_at_timeout)
{
printk("scsi%d: Unable to Reset Command to Target %d - "
"Already Completed or Reset\n",
HostAdapter->HostNumber, TargetID);
Result = SCSI_RESET_NOT_RUNNING;
goto Done;
}

>From advansys.c:

#if LINUX_VERSION_CODE >= ASC_LINUX_VERSION(1,3,89)
if (scp->serial_number != scp->serial_number_at_timeout) {
ASC_PRINT1(
"advansys_reset: timeout serial number changed for request %x\n",
(unsigned) scp);
do_scsi_done = ASC_FALSE;
scp_found = ASC_FALSE;
ret = SCSI_RESET_NOT_RUNNING;
} else
#endif /* version >= v1.3.89 */

I think it would be sufficient to add a
(reset_flags & SCSI_RESET_ASYNCHRONOUS) test to your code.

Leonard