Re: Overagressive failing of disk reads, both LIBATA and IDE

From: Mark Lord
Date: Thu Mar 19 2009 - 23:33:22 EST


Norman Diamond wrote:
For months I was wondering how a disk could do this:
dd if=/dev/hda of=/dev/null bs=512 skip=551540 count=4 # succeeds
dd if=/dev/hda of=/dev/null bs=512 skip=551544 count=4 # succeeds
dd if=/dev/hda of=/dev/null bs=512 skip=551540 count=8 # fails

It turns out the disk isn't doing that. Linux is. The old IDE drivers did
it, but with LIBATA the same thing happens to /dev/sda. In later examples
also, the same happens to /dev/sda as /dev/hda.
..

You can blame me for the IDE driver not doing that properly.
But for libata, it's the SCSI layer.

I've been patching this for years for my clients,
and will be updating the patch soon-ish and trying
again to get it into upstream kernels.

Here's the (now ancient) 2.6.20 version for SLES10:

* * *

Allow SCSI to continue with the remaining blocks of a request
after encountering a media error. Otherwise, it may just fail
the entire request, even though some blocks were fine and needed
by a completely different process than the one that wanted the bad block(s).

Signed-off-by: Mark Lord <mlord@xxxxxxxxx>

--- linux-2.6.16.60-0.6/drivers/scsi/scsi_lib.c 2008-03-10 13:46:03.000000000 -0400
+++ linux/drivers/scsi/scsi_lib.c 2008-03-21 11:54:09.000000000 -0400
@@ -888,6 +888,12 @@
*/
if (sense_valid && !sense_deferred) {
switch (sshdr.sense_key) {
+ case MEDIUM_ERROR:
+ /* Bad sector. Fail it, and then continue the rest of the request. */
+ if (scsi_end_request(cmd, 0, cmd->device->sector_size, 1) == NULL) {
+ cmd->retries = 0; // go around again..
+ return;
+ }
case UNIT_ATTENTION:
if (cmd->device->removable) {
/* Detected disc change. Set a bit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/