Re: LibPATA code issues / 2.6.15.4
From: David Greaves
Date: Sun Feb 26 2006 - 04:55:01 EST
Mark Lord wrote:
>> sdb: Current: sense key: Medium Error
>> Additional sense: Unrecovered read error - auto reallocate failed
>> end_request: I/O error, dev sdb, sector 398283329
>> raid1: Disk failure on sdb2, disabling device.
>> Operation continuing on 1 devices
>
>
> Oh good, *now* we've gotten somewhere!!
>
> Albert / Jens / Jeff:
>
> The command failing above is SCSI WRITE_10, which is being
> translated into ATA_CMD_WRITE_FUA_EXT by libata.
>
> This command fails -- unrecognized by the drive in question.
> But libata reports it (most incorrectly) as a "medium error",
> and the drive is taken out of service from its RAID.
>
> Bad, bad, and worse.
>
> Libata should really recover from this, by recognizing that
> the command was rejected, and replacing it with a simple
> WRITE_EXT instead. Possibly followed by FLUSH_CACHE.
>
> So.. I've forgotten who put FUA into libata, but hopefully
> it's one of the folks on the CC: list, and that nice person
> can now generate a patch to fix this bug somehow.
Thanks Mark
I'm glad it's a bug and not bad hardware.
I am quite concerned that the basic effect of just booting a practically
vanilla 2.6.16-rc4 like this was to fry my raid array.
Luckily it dropped 2 (of 3) disks so quickly that the event counter was
the same allowing an easy rebuild.
2.6.15 has similar issues but they seem to happen *very* infrequently by
comparison - this hit me several times during a single boot.
Should Linus (cc'ed) hold off on 2.6.16 because of this or not?
David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/