Re: Kernel v2.2.15pre3 SCSI 2940U2W with AIC7895 boot still broke.

From: Doug Ledford (dledford@redhat.com)
Date: Tue Jan 18 2000 - 16:49:32 EST


M Sweger wrote:

> I didn't try 2.2.15pre2 since it didn't have any SCSI fixes. However,
> I did compile 2.2.14 and it hangs at the same location. I then tried
> 2.2.15pre1 and it hangs during boot at the same location trying to
> download Narrow Channel B instructions.
annels is 404.
>
> Note: in linux 2.2.1 there are no SCSI error messsages being output to
> console;however, in linux 2.2.13 this is the following error message
> that is output
>
> (scsi0) Data overrun detected in Data-In phase, tag 1
> Have seen Data Phase length=255 NuSgs=1
>
> sg[0] Addr 0x7fec380 length=255
>
> I assume this is what is being fixed in the 2.2.14 scsi patches although
> it doesn't seem to cause any problem for me in linux 2.2.13.

Nope. Read on.

> Below are two different sets of "dmesg" listings. The first is for
> linux 2.2.1 and the seconnd is for linux 2.2.13 with the data phase error.
> Just search for "version" to jump to the second listing.

> scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.10/3.2.4
> <Adaptec AIC-7895 Ultra SCSI host adapter>
> scsi2 : SCSI host adapter emulation for IDE ATAPI devices
> scsi : 3 hosts.
> Vendor: WDIGTL Model: WDE9100-1807A4 Rev: 1.30
> Type: Direct-Access ANSI SCSI revision: 02
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi : detected 1 SCSI disk total.
> (scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> SCSI device sda: hdwr sector= 512 bytes. Sectors= 17783204 [8683 MB] [8.7 GB]

Next boot...

> scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.20/3.2.4
> <Adaptec AIC-7895 Ultra SCSI host adapter>
> scsi : 2 hosts.
> (scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> (scsi0:0:0:0) Data overrun detected in Data-In phase, tag 1;
> Have seen Data Phase. Length=255, NumSGs=1.
> sg[0] - Addr 0x7fec380 : Length 255
> Vendor: WDIGTL Model: WDE9100-1807A4 Rev: 1.30
> Type: Direct-Access ANSI SCSI revision: 02
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> scsi : detected 1 SCSI disk total.

There are two distinctly different problems here. One, the latest driver
hangs when downloading the channel B sequencer to your 7895. Between 5.1.20
and 5.1.23, the sequencer download code was changed to fix similar hangs on
other machines. There is a pause between each of the bytes we download now.
That solved the other machine's lockup, and evidently caused yours. If you go
into the aic7xxx_download_instr() routine in the aic7xxx driver and take out
(or shorten) some of the udelay(50); calls between the various aic_outb()
calls, does the hang problem with 5.1.23 go away? If so, it would help to
have some sort of "magical number" where things quit working on your machine.

In 5.1.21, the change to that function was made because of some 3940W
controllers (at least a couple) that would lock up during sequencer download.
As I recall, I had to add a delay in between each byte of download to make it
work. The back to back byte downloads were failing. Because I didn't have
any clear data about what was required to fix the problem, I very well may
have allocated more delay than is needed ;-) I would suggest trying something
like a udelay(2); in between each of the downloads, and a udelay(25); after
the last instruction byte. That might make your problem go away and possibly
also not ruin the purpose of the original fix. I'll have to send a note to
the aic7xxx mailing list about the change as well (if those numbers work for
you) to try and get the people who had cards that locked with 5.1.20 to try
the new numbers.

The second problem is the sg overrun error. Notice that in the old driver,
the device speed was negotiated *after* the INQUIRY data is printed out. In
the new version, it happens *before* the INQUIRY data comes out. The
difference here is that in the second version, the drive is already
transferring data in Wide mode by the time it transfers the INQUIRY data.
I've got another report of the sequencer mishandling WIDE_RESIDUE messages,
and this appears to be the same thing. Since bytes are transferred two at a
time in wide mode, what's happening is that the drive is trying to transfer
the full 255 bytes that the INQUIRY command is requesting, and in order to do
so, it has to transfer 256 bytes and then send a WIDE_RESIDUE message. We
aren't allowing the message to come through before the sequencer is signalling
the overrun. That's where that error message is coming from. FWIW, if you
weren't getting the overrun error, you would be getting bus resets because the
driver would reject the wide residue message and that would piss the disk off
(the other report I have has a device that doesn't send back the full 255
bytes on the INQUIRY and so when it does its WIDE_RESIDUE, a reset loop
occurs).

In any case, if you can get me some magical numbers to plug into the
aic7xxx_download_instr() routine, then I'll try to get a new aic7xxx version
out that includes those magical numbers and a fix for the WIDE_RESIDUE problem
(if I can get it done in time, sequencer hacking is far harder than kernel
driver hacking).

-- 
  Doug Ledford   <dledford@redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:19 EST