AHA1542 driver is buggy

Steven S. Dick (ssd@nevets.oau.org)
Tue, 5 Oct 1999 01:54:10 -0400 (EDT)


I have an aha1542, and it appears the driver is still buggy.
I don't think the reset code in the driver has ever worked correctly.

I have a couple of cd's with bad scratches; when any of my scsi drives
attempts to read over a scratch, the 1542 goes into a reset loop.
Before the following patch, it would reset a couple of times and then
die with "Panic: freeing free memory" or something close...

(this patch against 2.2.12, but probably works in others too)

--- linux/drivers/scsi/aha1542.c.orig Mon Aug 9 15:04:39 1999
+++ linux/drivers/scsi/aha1542.c Sun Sep 26 12:27:27 1999
@@ -482,7 +482,10 @@
}

my_done = SCtmp->scsi_done;
- if (SCtmp->host_scribble) scsi_free(SCtmp->host_scribble, 512);
+ if (SCtmp->host_scribble) {
+ scsi_free(SCtmp->host_scribble, 512);
+ SCtmp->host_scribble = 0;
+ }

/* Fetch the sense data, and tuck it away, in the required slot. The
Adaptec automatically fetches it, and there is no guarantee that

This fix looked blatently obvious to me, and after making it, I examined
all the other scsi drivers and noted that all of them use scsi_free in
this manner *except* for aha1542 and aha1740. Most of the places in these
two drivers free the structure containing the scsi_free pointer one or two
lines after calling scsi_free, but the above code does not, and one spot
in the 1740 driver might not.

With this patch, the kernel no longer panics, and an I/O error is returned
to the application. Unfortunately, after this, the device no longer
responds, and any access to the device causes the process to block but
still remain killable. I fear I've fixed a symptom, and not the real bug.
(I.e., it still hurts, but the patient no longer dies.)

The device itself does not hang actually--just the driver.
My second cdrom is a 7 platter changer, and after accessing the scratch,
the LUN containing the damaged disc is no longer usable, but the rest
of the LUN's on that device continue to work fine.

Has anyone else had similar problems with the aha1542 or 1740?
Could someone help me make enough sense out of the 1542 code to
finish fixing this bug? For me, the problem is 100% repeatable,
and only takes a few moments to set up a trigger.

Steve
ssd@nevets.oau.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/