Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device152d:2338

From: Robert Hancock
Date: Tue Jul 22 2008 - 04:45:53 EST


Tomas Styblo wrote:
* Robert Hancock <hancockr@xxxxxxx> [Tue, 22 Jul 2008]:
In any case, given that your code apparently fixes the corruption it seems that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the SCSI layer looks at the sense data and says "hmm, nothing to worry about here" and carries on.

That's exactly what I thought was happening, after a cursory look at the SCSI code.

I think we do need something like your patch, though it should likely be moved inside the if (need_auto_sense) check, and I don't see a reason to limit to this device ID only.

Thank you. This is a very insidious bug as it doesn't manifest
itself very often, months of data corruption may pass before you
notice it.

So is there a bug in the chipset, or does the error handling code not follow specifications?

It looks clear to me that it's a bug in the chipset. It's supposed to set some valid sense data if an error occurs, not just set the "failed" flag in the USB storage status word. (Presumably the fact that these errors are occurring in the first place is a bug in itself.. though that could be a problem with the enclosure or drive as well.)

However the kernel should be more robust and not ignore the error indication that it is giving.

I wonder if the company that makes the chipset should be notified
about this problem?

I suppose it wouldn't hurt to let JMicron know about this. I doubt they could do anything for existing chipsets, but it might help them avoid this bug in future designs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/