Re: sparc64 mt race?

Rich Sahlender (rsahlen@voicenet.com)
Fri, 30 Oct 1998 16:21:32 -0500 (EST)


This is an Ultra 1 Creator, basically just a newer 170e. The tape drive
is in an external cabinet with terminator and the cable has wide connector
at ultra end and centronics at drive end. The "mt -f /dev/st0 status"
does the same as "offload" although I get timeouts instead of parity errors
on the console.

I also tried connecting a small BoxHill Disk tower with a couple 3gb
narrow scsi disks in it. The machine hangs during boot after probing
the scsi bus and getting timeouts. I'll try a wide/fast external 9gb
early next week and see if it's just a wide to narrow problem or
if it's *any* external device.

Roger J. Allen wrote:
> On Thu, 29 Oct 1998, Rich Sahlender wrote:
>
> > I'm not sure if this is a kernel, scsi, or 32 .vs. 64 bit issue...
> >
> > Using the latest 2.1.126 from vger cvs with ultrapenguin-1.0.9 on an
> > Ultra1, a simple "mt -f /dev/st0 off" to unload a tape sends the cpu
>
> Is the offline command your only problem, or do you also get errors with
> other mt commands that access the drive?
>
> > to 100%. Running processes continue but cannot terminate, new processes
> > will not start. dmesg shows:
> >
> > esp0: IRQ 3,7e0 SCSI ID 7 Clock 40 MHz CCF=8 Time-Out 167 NCR53C9XF(espfast) detected
> > esp0: FAST chip is fasHME (family=10, version=5)
> > ESP: Total of 1 ESP hosts found, 1 actually in use.
> > scsi0 : Sparc ESP366-HME
> > scsi : 1 host.
>
> My ultra1 170E has the same values for the esp. I think that is where
> the problem(s) lie(s).
>
> > Vendor: SEAGATE Model: ST32171W SUN2.1G Rev: 7462
> > Type: Direct-Access ANSI SCSI revision: 02
> > Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> > Vendor: EXABYTE Model: EXB-85008SQANXBA Rev: 07J0
> > Type: Sequential-Access ANSI SCSI revision: 02
> > Detected scsi tape st0 at scsi0, channel 0, id 4, lun 0
>
> I may have the same problem, but my dmesg output shows error messages.
> This is with an Exabyte 8505XLE attached to the EXTERNAL scsi port,
> which works fine with Solaris 2.6. Is your tape drive external or
> internal? Maybe it does not matter.
>
> > Vendor: TOSHIBA Model: XM5701TASUN12XCD Rev: 0997
> > Type: CD-ROM ANSI SCSI revision: 02
> > Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0
> > scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total.
> > esp0: Disabling sync for buggy Toshiba CDROM.
> ^^^^^
> The Exabyte may have similar ?bug(s)?.
>
> > esp0: Disabling DISCONNECT for target 6 lun 0
> > esp0: target 6 asynchronous
> > Uniform CDROM driver Revision: 2.14
> > esp0: Disabling DISCONNECT for target 0 lun 0
> > esp0: 16 byte WIDE transfers enabled for target 0.
> > esp0: target 0 [period 100ns offset 15 20.00MHz FAST-WIDE SCSI-II]
> > SCSI device sda: hdwr sector= 512 bytes. Sectors= 4157201 [2029 MB] [2.0 GB]
> > sunhme.c:v1.2 10/Oct/96 David S. Miller (davem@caipfs.rutgers.edu)
> > eth0: HAPPY MEAL (SBUS) 10/100baseT Ethernet 08:00:20:8d:21:cd
> > eth0: Link is up using internal transceiver at 100Mb/s, Full Duplex.
> >
> > Nothing unusual there nor in syslog. Is there anything else I can
> > provide to help?
> >
>
> Instead of "mt -f /dev/st0 off", try "mt -f /dev/nst0 status". When I
> try to get the status, I get esp parity errors:
>
> esp0: Disabling DISCONNECT for target 4 lun 0
> esp0: data bad parity detected.
> esp0: data bad parity detected.
> esp0: yieee, bytes_sent < 0!
> esp0: csz=0 fifocount=0 ecount=0
> esp0: use_sg=0 ptr=0000000000500006 this_residual=0
> esp0: Forcing async for target 4
> esp0: got status only, esp0: bad parity somewhere mout= 5
> esp0: still in msgout, parity error assumed
> esp0: data bad parity detected.
> esp0: data bad parity detected.
> esp0: yieee, bytes_sent < 0!
> esp0: csz=0 fifocount=0 ecount=0
> esp0: use_sg=0 ptr=000000000050000c this_residual=0
> esp0: Forcing async for target 4
> esp0: got status only, esp0: bad parity somewhere mout= 5
> esp0: still in msgout, parity error assumed
>
> Someone mentioned checking the termination. Since the ultra has a 68
> pin scsi and the Exabyte has a 50 pin centronics connector, I tried an
> active terminator and a 68 to 50 pin adapter with high bit termination,
> but that did not help (nor hurt).
>
> One symptom that I noticed was the output from /proc/scsi/esp/0. Before
> the tape drive is accessed, it shows:
>
> Sparc ESP Host Adapter:
> PROM node fffffffff0061044
> PROM name SUNW,fas
> ESP Model Happy Meal FAS
> DMA Revision Rev HME/FAS
> Live Targets [ 0 1 4 6 ]
>
> Target # config3 Sync Capabilities Disconnect Wide
> 0 000000e3 [5f,04] no yes
> 1 000000e3 [5f,04] no yes
> 4 000000a1 [5f,04] no no
> 6 000000a1 [5f,04] no no
>
> But after running an "mt -f /dev/nst0 status" command, the esp complains
> with the esp errors and the cat /proc/scsi/esp/0 shows:
>
> Sparc ESP Host Adapter:
> PROM node fffffffff0061044
> PROM name SUNW,fas
> ESP Model Happy Meal FAS
> DMA Revision Rev HME/FAS
> Live Targets [ 0 1 4 6 ]
>
> Target # config3 Sync Capabilities Disconnect Wide
> 0 000000e3 [5f,04] no yes
> 1 000000e3 [5f,04] no yes
> 4 000000e1 [5f,04] no yes
> 6 000000a1 [5f,04] no no
>
> Which shows the tape drive on target 4 as a WIDE scsi instead of NARROW!
>
> If I try to read from the drive, then the esp parity errors are constant
> and nothing can be done except reboot.
>
> In the kernel drivers/scsi/esp.c code, there is a section where it
> checks to see if the device is wide or not. There is some code that
> claims that Toshiba CD-ROMS are buggy and they do not get checked to see
> if they are wide (or synchronous). I copied the Toshiba code where it
> checks if the device is wide, changed it to also check for Exabyte tape
> drives, re-built the kernel, rebooted, and then I could read from the
> tape drive without any errors!
>
> Here is what I added (I wish I changed the order of the files for the
> diff command):
>
> *** esp.c Thu Oct 29 18:17:11 1998
> --- esp.c-dist Tue Sep 22 19:13:54 1998
> ***************
> *** 1429,1436 ****
> */
> if(esp->erev == fashme && !SDptr->wide) {
> if(!SDptr->borken &&
> - (SDptr->type != TYPE_TAPE ||
> - strncmp(SDptr->vendor, "EXABYTE", 7)) &&
> (SDptr->type != TYPE_ROM ||
> strncmp(SDptr->vendor, "TOSHIBA", 7))) {
> build_wide_nego_msg(esp, 16);
> --- 1429,1434 ----
>
> Maybe the test should be for just my model of Exabyte, or all Exabyte's,
> like the above code. Are there wide Exabyte tape drives that this would
> break? Are all Toshiba cd-rom's buggy, narrow, and asynchronous?
>
> After changing the code in esp.c, I am still getting timeouts on the
> scsi bus when I rewind the tape drive for only a few hundred megabytes.
> This is when the tape is read without any errors using:
>
> cpio -itv -C 1024 -I /dev/nst0
>
> esp0: Disabling DISCONNECT for target 4 lun 0
> esp0: target 4 [period 200ns offset 11 5.00MHz synchronous SCSI]
>
> Later, when it is rewound, these occur:
>
> scsi : aborting command due to timeout : pid 9822, scsi0, channel 0, id 1, lun 0 Write (6) 00 40 38 02 00
> esp0: Aborting command
> esp0: dumping state
> esp0: dma -- cond_reg<b2b70a10> addr<1e000000>
> esp0: SW [sreg<17> sstep<04> ireg<20>]
> esp0: HW reread [sreg<12> sstep<cb> ireg<00>]
> esp0: current command [tgt<04> lun<00> pphase<UNISSUED> cphase<SLCTNORM>]
> esp0: disconnected
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 50000
> scsidisk I/O error: dev 08:11, sector 16440
> scsi : aborting command due to timeout : pid 9823, scsi0, channel 0, id 1, lun 0 Write (6) 02 40 30 02 00
> esp0: Aborting command
> esp0: dumping state
> esp0: dma -- cond_reg<b2b70a10> addr<1e000000>
> esp0: SW [sreg<17> sstep<04> ireg<20>]
> esp0: HW reread [sreg<12> sstep<cb> ireg<00>]
> esp0: current command [tgt<04> lun<00> pphase<UNISSUED> cphase<SLCTNORM>]
> esp0: disconnected
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 50000
> scsidisk I/O error: dev 08:11, sector 147504
> scsi : aborting command due to timeout : pid 9974, scsi0, channel 0, id 1, lun 0 Read (6) 02 40 30 02 00
> esp0: Aborting command
> esp0: dumping state
> esp0: dma -- cond_reg<b2b70a10> addr<1e000000>
> esp0: SW [sreg<17> sstep<04> ireg<20>]
> esp0: HW reread [sreg<12> sstep<cb> ireg<00>]
> esp0: current command [tgt<04> lun<00> pphase<UNISSUED> cphase<SLCTNORM>]
> esp0: disconnected
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 50000
> scsidisk I/O error: dev 08:11, sector 147504
>
> I also get timeouts with "mt -f /dev/st0 offline".
>
> My wild guess is that the disabling of DISCONNECT for all devices on the
> fashme/fasHME/espfast/ESP366-HME/NCR53C9XF (a few lines later in esp.c)
> is causing the timeouts.
>
> --
> Roger J. Allen Rush-Presbyterian-St. Luke's Medical Center
> System Administrator Chicago, IL USA
> Surgical Information Systems Voice: (312)-942-4825
> Internet: rja@sis.rpslmc.edu FAX: (312)-733-6921
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/