Re: SCSI problems still there

Michael Thomas (mike@fasolt.mtcc.com)
25 Feb 1997 12:15:10 -0800


lnz@dandelion.com (Leonard N. Zubkoff) writes:
> From: "Stephen Davies" <scldad@sdc.com.au>
>
> SCSI support in 2.0.29 has changed from earlier versions but still fails
> for me when I try to tar a SCSI disk to a SCSI tape.

This sounds similar to what I'm experiencing
with a Buslogics 948 controller, a couple of SCSI
disks and an HP Surestore tape drive (SCSI). After
some boneheaded termination issues since resolved,
I'm still getting unhappy results. When I do a
backup across the net to the tape, I don't seem to
be having any problems. However, when I do a local
backup I've been getting intermittent lockups with
damaged filesystems, but also so interesting
garbage in the kernel logs, like:

/* a whole bunch of these, 8:01 is the system disk */
attempt to access beyond end of device
08:01: rw=0, want=1952542573, limit=987966

/* another boo boo with kfree, interestingly the same address as
a previous hang, and the next one */
kfree of non-kmalloced memory: 017cf018, next= 3a525058, order=1330793472
kfree of non-kmalloced memory: 0247c018, next= 742d6f77, order=1718558835

Note in both cases, it looks like ascii
garbage. In the first case, it is
make_request/ll_rw_blk.c that is finding the error
in the buffer_header, and the second is
kfree/kmalloc.c which is finding a corrupt
page_descriptor. Since a great deal of what the
tape is writing is ascii files, this looks very
suspiciously like either a buffer overwrite or a
bogus/stale pointer kind of error. (I suppose I
could turn on SADISTIC_KMALLOC to find out if it's
the stale variety).
I've recompiled my kernel to dump out the first
16 bytes of the header to see if it actually looks
familiar when the error occurs, so we'll see.

> If you look at the changes in 2.0.29, you will see that there have been no
> changes to the SCSI subsystem nor to the Adaptec 1542 driver that would explain
> this lockup problem. In fact, 2.0.29 contains very few changes of any kind.

If this is the same problem, it would tend to
implicate either the generic SCSI code or st.c
since I'm getting similar unhappiness on the
BusLogics controller. I've been send Kai mail with
as much info as I can dig up about my
configuration and what I'm actually seeing. If you
know of anything else which might be suspicious,
I'd be happy to help track it down.

-- 
Michael Thomas	(mike@mtcc.com http://www.mtcc.com/~mike/)
        "I dunno, that's an awful lot of money."
			Beavis