SCSI fixups in 2.0.2[67]

Dan Merillat (Dan@merillat.org)
Sun, 8 Dec 1996 20:13:18 -0500 (EST)


Ok, this is what I saw in 2.0.25, and I was wondering if this was one of the
things fixed in .26 or .27 (I havn't seen it yet, but it's not very
reproducable)

What happened: I had a SCSI disk that wasn't getting sufficient airflow
(An IDE cable had slipped between the fan and the drive... grr) Anyway,
it apparently went into a shutdown until it cooled off (Nice of it not
to burn up) Anyway, when attempting to read from the disk:
Dec 8 18:35:31 chaos kernel: scsidisk I/O error: dev 08:11, sector 3407928
Dec 8 18:35:31 chaos kernel: Kernel panic: EXT2-fs panic (device 08:11): ext2_r
ead_inode: unable to read i-node block - inode=426105, block=1703964
Dec 8 18:35:31 chaos kernel:
Dec 8 18:35:31 chaos kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 retu
rn code = 28000002
Dec 8 18:35:31 chaos kernel: extra data not valid Current error sd08:11: sense
key Not Ready
Dec 8 18:35:31 chaos kernel: Additional sense indicates Logical unit is in proc
ess of becoming ready

Which is understandable. Once the drive cooled off, it spun up again...
as expected. Here lies the problem: There are now outstanding requests
for blocks on the disk that hadn't been fufilled since the disk had been
spun down (Which also makes sence) Once the disk spun back up again, I could
do things like ls, cat other files, create new files, etc. UNTIL one of the
tests I did hit the same area as was still outstanding when it was spun down..
then it hung. The drive was fine, and was accepting and fufilling requests,
but the outstanding requests were either stuck in the VFS or in the scsi
driver. (Buslogic, BTW)

I tried a number of things to get it to retry those commands (sync, umount,
retrying the file, etc)

This can be quickly reproduced if you have a junk SCSI disk and are willing
to possibly blow it: You can pull the plug on the disk, then plug it in
again... this will give you the "Logical unit is in process of becoming
ready" condition until it has spun up again. Unfortunatly, I don't have
a disk I can play with... and now that the failing disk has enough air
I havn't seen it again. In this case, the hardware wasn't TOTALLY at fault...
we have an error condition that is quite possible to recover from, but was
locked in some sort of error loop.

Replys relating to linux internals only please... I have allready gotten the
disk back online and it won't do this again, but this is a condition that
we should be able to recover from (Temporary media unavailibility...)

--Dan