Re: SCSI timeouts with Advansys driver under Linux

Nathan Bryant (nathan@burgessinc.com)
Wed, 4 Dec 1996 21:56:50 -0500 (EST)


On Wed, 4 Dec 1996, Robert Frey wrote:

> What version of the Linux kernel are you using? (Apparently something
> later than v1.3.89, based on the reset code.) What version of the
> AdvanSys driver are you using? (Printed at boot time and available
> from 'cat /proc/scsi/advansys/0'.) The latest is 2.0.

My kernel is version 2.0.27; the problem also occurred on 2.0.26. The
Advansys driver is the stock driver included in 2.0.27. The dmesg output:

scsi0 : AdvanSys SCSI 2.0: ISA PnP 16 CDB: BIOS C800, IO 110-11F, IRQ 11,
DMA 6

By the way, I used the same Advansys card in a different machine for a
while with no problems. That machine was a PCI Pentium with 32 megs ram
and also had an AHA1542. (Not the same AHA1542 as is in this machine.) On
the machine where the Advansys currently resides, both Windows 95 and
Linux ran without problems until now.

> Do you mean the single 'cat' command was running for several minutes
> before the timeout? So it was reading the CD-ROM sequentially.
> That would appear to be the case. According to the message below the
> command timed out on a read of 16KB (8 * 2KB) about 86MB (A44A * 2KB)
> into the CD.

Exactly. I was catting the contents of a CD onto a raw partition. The
timeout happened well into the copy. (In fact, the filesystem on the CD is
only 106 megs so it was about 80% done. File that under "useless
statistics.")

>
> > scsi : aborting command due to timeout : pid 7890, scsi0, channel 0, id 2,
> > lun 0 Read (6) 00 a4 4a 08 00
> > scsi : aborting command due to timeout : pid 7890, scsi0, channel 0, id 2,
> > lun 0 Read (6) 00 a4 4a 08 00
> > scsi0 channel 0 : resetting for second half of retries.
> > SCSI bus is being reset for host 0 channel 0.
> > advansys: advansys_reset: timeout serial number changed for request 256808
>
> I have never seen an abort request return "timeout serial number...".
> The way I've written the driver I assume different serial numbers in
> an abort request to mean that the command was completed after the abort
> was started. The abort request is ignored and I don't do a 'done' on the
> command.
>
> > At this point, the "cat" process which was reading from the cd drive
> > blocks and becomes unkillable. Any process which calls sync() blocks
> > forever and becomes unkillable.
> Probably because no 'done' was ever done on the aborted command.
>
> Perhaps Leonard can shed some light on how the serial number mismatch
> occurred and comment on whether my handling of it is correct.

It may not be. But even if it isn't that still leaves the question of why
the timeout occurred in the first place... Bad hardware perhaps?

+-----------------------+----------------------------------+
| Nathan Bryant | Resident Unix Geek |
| nathan@burgessinc.com | Burgess Business Solutions, Inc. |
+-----------------------+----------------------------------+