Re: LibPATA code issues / 2.6.15.4

From: Mark Lord
Date: Sun Feb 26 2006 - 09:02:05 EST


David Greaves wrote:
Mark Lord wrote:

sdb: Current: sense key: Medium Error
Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 398283329
raid1: Disk failure on sdb2, disabling device.
Operation continuing on 1 devices
..
The command failing above is SCSI WRITE_10, which is being
translated into ATA_CMD_WRITE_FUA_EXT by libata.

This command fails -- unrecognized by the drive in question.
But libata reports it (most incorrectly) as a "medium error",
and the drive is taken out of service from its RAID.

Bad, bad, and worse.
..
Thanks Mark

I'm glad it's a bug and not bad hardware.

I am quite concerned that the basic effect of just booting a practically
vanilla 2.6.16-rc4 like this was to fry my raid array.

Luckily it dropped 2 (of 3) disks so quickly that the event counter was
the same allowing an easy rebuild.

2.6.15 has similar issues but they seem to happen *very* infrequently by
comparison - this hit me several times during a single boot.

Should Linus (cc'ed) hold off on 2.6.16 because of this or not?

Well, no doubt whatsoever about it being a "regression",
since the FUA code is *new* in 2.6.16 (not present in 2.6.15).

The FUA code should either get fixed, or removed from 2.6.16.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/