Re: Kernel v2.2.15pre5 SCSI 2940U2W with AIC7895 boot still broke (with fix).

From: M Sweger (mikesw@whiterose.net)
Date: Mon Feb 07 2000 - 17:30:07 EST


Hi all,

    Here is the solution needed to get kernel v2.2.15pre3/pre5 etc to
boot and finish downloading the instructions on the Adaptec 2940U2W
with AIC7895 for NArrow Channel B. I'll mention also what I did to
get this value. I'd like to make a suggestion also in conclusion too.

A). Change the udelay(50) in the function aic7xxx_download_instr()
    to the following values to find the magic number. I tried the
    following:
    all 0's
    all 1's
    all 5's
    all 15's
    all 25's
    all 50's
    all 1500's

    The above didn't work.

  Next removed all the udelays except last one and changed it to
udelay(15) to be like linux 2.2.13.
  Next commented out all the udelays()

     The above two configs didn't work.

B). I then put a printk(KERN_INFO "message"); into the beggining of this
     aic7xxx_download_instr() to see what was going on and found out
     that it wasn't calling this function for NARROW CHANNEL B. Thus,
     no matter what value that udelay() would have had, it wouldn't
     do any good.

C). I then put alot of printk(KERN_INFO "message"); into this driver
     to find out where it was getting hung. The printk's were put
     in functions configure_termination() after the "Cables present"
      and in the function aic7xxx_register(). It then BOOTED!!!!

      Thus printk's put enough delay somewhere to make it boot.
      Therefore, the solution is to permanently :) put printk's
      into the source code driver. This is undesirable, see item (D)

D). I then went one by one to find out which printk() made it work.
      However, when it crashed it was getting stuck on the
aic_outb(p,0,SCSIRATE) function in aic7xxx_register() before looking
at the information that the board or BIOS left us comment.

        The printk() in the function configure_termination() that
        is in this section of code,

                .....
                aic_outb(p, sxfrctl12, SXFRCTL1);
                write_brdctl(p,brddat);
                printk(KERN_INFO "works when this message is here");
                release_seeprom(p);
                .....

E). The next thing was to find the minimum udelay value to put
      in place of this printk() function.

       It was determined that a udelay(35) or greater would make
       this kernel 2.2.15pre5 boot.

       SUGGESTION: Put a udelay(50) here instead to be on the safe
                    side so that it isn't on the *edge* of a timing
                     problem.

       I also tried mdelay(1) too and it booted.

F). I also copied aic7xxx.c from the linux 2.2.13 kernel in place
      of this driver and I was able to boot linux 2.2.15pre5 too!

CONCLUSION:
      a. Is it possible to detect some register flag from the SCSI
         card and break out on that when it goes high vs. guessing
         a udelay() values?

      b. Is it possible to put a udelay(5) in the aic_inb() and aic_outb()
         functions so that anytime the function(s) are called they all
         get the same delay. Thus, one could remove all the udelay()
         thoroughout the driver.

         Moreover, change the udelay(1) to udelay(5) in the functions
          read_brdctl() and write_brdctl() too.

     c. With the above udelay(50) fix, retest the 3940W Adaptec cards
        to see if the udelay() values that were added to function
        aic7xxx_download_instr() can be removed or set back to what
        linux 2.2.13 kernel had. The reason I asking this is because
        this board series also got hung on download_instrs too and
        the above solution may correct it also.

        Anybody out there with this card to try out this fix and report
        back?

Well that's it.
Hope to see this fix in linux 2.2.15pre7 since it''s to late for
2.2.15pre6.

MIke

On Tue, 18 Jan 2000, Doug Ledford wrote:

> M Sweger wrote:
>
> > I didn't try 2.2.15pre2 since it didn't have any SCSI fixes. However,
> > I did compile 2.2.14 and it hangs at the same location. I then tried
> > 2.2.15pre1 and it hangs during boot at the same location trying to
> > download Narrow Channel B instructions.
> annels is 404.
> >
> > Note: in linux 2.2.1 there are no SCSI error messsages being output to
> > console;however, in linux 2.2.13 this is the following error message
> > that is output
> >
> > (scsi0) Data overrun detected in Data-In phase, tag 1
> > Have seen Data Phase length=255 NuSgs=1
> >
> > sg[0] Addr 0x7fec380 length=255
> >
> > I assume this is what is being fixed in the 2.2.14 scsi patches although
> > it doesn't seem to cause any problem for me in linux 2.2.13.
>
> Nope. Read on.
>
> > Below are two different sets of "dmesg" listings. The first is for
> > linux 2.2.1 and the seconnd is for linux 2.2.13 with the data phase error.
> > Just search for "version" to jump to the second listing.
>
> > scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.10/3.2.4
> > <Adaptec AIC-7895 Ultra SCSI host adapter>
> > scsi2 : SCSI host adapter emulation for IDE ATAPI devices
> > scsi : 3 hosts.
> > Vendor: WDIGTL Model: WDE9100-1807A4 Rev: 1.30
> > Type: Direct-Access ANSI SCSI revision: 02
> > Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> > scsi : detected 1 SCSI disk total.
> > (scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> > SCSI device sda: hdwr sector= 512 bytes. Sectors= 17783204 [8683 MB] [8.7 GB]
>
> Next boot...
>
> > scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.20/3.2.4
> > <Adaptec AIC-7895 Ultra SCSI host adapter>
> > scsi : 2 hosts.
> > (scsi0:0:0:0) Synchronous at 40.0 Mbyte/sec, offset 8.
> > (scsi0:0:0:0) Data overrun detected in Data-In phase, tag 1;
> > Have seen Data Phase. Length=255, NumSGs=1.
> > sg[0] - Addr 0x7fec380 : Length 255
> > Vendor: WDIGTL Model: WDE9100-1807A4 Rev: 1.30
> > Type: Direct-Access ANSI SCSI revision: 02
> > Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> > scsi : detected 1 SCSI disk total.
>
> There are two distinctly different problems here. One, the latest driver
> hangs when downloading the channel B sequencer to your 7895. Between 5.1.20
> and 5.1.23, the sequencer download code was changed to fix similar hangs on
> other machines. There is a pause between each of the bytes we download now.
> That solved the other machine's lockup, and evidently caused yours. If you go
> into the aic7xxx_download_instr() routine in the aic7xxx driver and take out
> (or shorten) some of the udelay(50); calls between the various aic_outb()
> calls, does the hang problem with 5.1.23 go away? If so, it would help to
> have some sort of "magical number" where things quit working on your machine.
>
> In 5.1.21, the change to that function was made because of some 3940W
> controllers (at least a couple) that would lock up during sequencer download.
> As I recall, I had to add a delay in between each byte of download to make it
> work. The back to back byte downloads were failing. Because I didn't have
> any clear data about what was required to fix the problem, I very well may
> have allocated more delay than is needed ;-) I would suggest trying something
> like a udelay(2); in between each of the downloads, and a udelay(25); after
> the last instruction byte. That might make your problem go away and possibly
> also not ruin the purpose of the original fix. I'll have to send a note to
> the aic7xxx mailing list about the change as well (if those numbers work for
> you) to try and get the people who had cards that locked with 5.1.20 to try
> the new numbers.
>
> The second problem is the sg overrun error. Notice that in the old driver,
> the device speed was negotiated *after* the INQUIRY data is printed out. In
> the new version, it happens *before* the INQUIRY data comes out. The
> difference here is that in the second version, the drive is already
> transferring data in Wide mode by the time it transfers the INQUIRY data.
> I've got another report of the sequencer mishandling WIDE_RESIDUE messages,
> and this appears to be the same thing. Since bytes are transferred two at a
> time in wide mode, what's happening is that the drive is trying to transfer
> the full 255 bytes that the INQUIRY command is requesting, and in order to do
> so, it has to transfer 256 bytes and then send a WIDE_RESIDUE message. We
> aren't allowing the message to come through before the sequencer is signalling
> the overrun. That's where that error message is coming from. FWIW, if you
> weren't getting the overrun error, you would be getting bus resets because the
> driver would reject the wide residue message and that would piss the disk off
> (the other report I have has a device that doesn't send back the full 255
> bytes on the INQUIRY and so when it does its WIDE_RESIDUE, a reset loop
> occurs).
>
> In any case, if you can get me some magical numbers to plug into the
> aic7xxx_download_instr() routine, then I'll try to get a new aic7xxx version
> out that includes those magical numbers and a fix for the WIDE_RESIDUE problem
> (if I can get it done in time, sequencer hacking is far harder than kernel
> driver hacking).
>
>
> --
> Doug Ledford <dledford@redhat.com>
> Opinions expressed are my own, but
> they should be everybody's.
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Feb 07 2000 - 21:00:15 EST