Re: 2.0.34p8 scsi hassles (why doesn't anything work?)

Doug Ledford (dledford@dialnet.net)
Sun, 19 Apr 1998 03:44:23 -0500


Jordan Mendelson wrote:
>
> I just can't seem to win, if it's not one bug... it's another.. oi. After
> 3 day uptime (longest I've had in weeks) on my Squid cache server... I got
> this:
>
> SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 2603000
> scsidisk I/O error: dev 08:02, sector 8, absolute sector 32138
> SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 2603000
> scsidisk I/O error: dev 08:02, sector 13048, absolute sector 45178
> SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 2603000
> scsidisk I/O error: dev 08:02, sector 2, absolute sector 32132
> SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 2603000
> scsidisk I/O error: dev 08:02, sector 10, absolute sector 32140
> SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 2603000
> scsidisk I/O error: dev 08:02, sector 65588, absolute sector 97718
> SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 2603000
> scsidisk I/O error: dev 08:02, sector 147496, absolute sector 179626

These indicate a drive failure of some sort (bad sectors, non-responsive
drive, whatever) and aren't directly related to the updated aic7xxx driver
except that the driver update makes the driver properly report these errors
to the upper level code where as the old driver didn't.

> SCSI bus is being reset for host 0 channel 0 id 0 lun 0 return code = 80000
> scsidisk I/O error : dev 08:02, sector 360484, absolute sector 392614
> scsi : aborting command due to timeout : pid 2915281, scsi 0, channel 0,
> id 0, lun 0 Write (6) 02 fd f4 02 00
> SCSI bus is being reset for host 0 channel 0
> Kernel panic : aic7xxx: AWAITING_MSG for an SCB that does not have a
> waiting message.

This last message is a bug in the abort/reset code. I plan on getting it
tracked down and fixed within the next hour or so. Shouldn't be that hard
to track, although it would have been easier if you had the verbosity of the
driver turned up so I got more messages (see the drivers/scsi/README.aic7xxx
file for a description of the driver boot options including the verbosity
level).

>
> In swapper task - not synching.
>
> Was the aic7xxx driver in this pre kernel replaced with one from the
> development kernels?

No, both drivers (the 2.1 and 2.0 kernel series) where updated with a driver
that's been in use for about 4 months now. However, some things do
sometimes slip through. I apologize for that, but I was unable to duplicate
your exact condition in my test environment so I didn't catch that one in my
testing.

> Can anyone recommend a stable kernel? I've tried everything... 2.1.9x's
> have networking problems. 2.0.33 has that malloc bug so my machine hangs
> every day. 2.0.30-32 have networking hassles.
>
> I've replaced all of my hardware because I initially thought it was a
> hardware problem. I've replaced my kernel 5 times and I can't get this box
> to run for more than 3 days without crashing. The funny thing is that if I
> stop running Squid, this machine will run for 8 months without so much
> as skipping a beat (it has)...

Machines will do lots of nice things under light load that they won't do
under heavy load when they have bad hardware. Even though there is a bug in
the aic7xxx driver you were using, there is also a problem with one drive.
I would strongly suggest you look into that as well since no matter how good
the driver is, if the drive goes comatose during operation or decides to
develope bad sectors, then your hosed.

> Linux 2.0.34
> Adaptec 2940 card

-- 

Doug Ledford <dledford@dialnet.net> Opinions expressed are my own, but they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu