Re: Why does the md/raid subsystem does not remap bad sectors ina raid array?

From: Robert Hancock
Date: Sat Nov 22 2008 - 21:00:28 EST


Justin Piszcz wrote:
I asked before but it was kind of clobbered in the velociraptor mess:

On a colleague's box:

Aug 02, 2008 12:15.30AM(0x04:0x0023): Sector repair completed: port=7, LBA=0x4A0387F5

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 305 1241745397

Even though this disk has a bad sector:
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1

The controller does not drop the drive from the array when it hits an error, the 3ware card "takes care of it" and the user need not worry about it, whereas with md/raid every time it hits a bad sector, it breaks the raid and it goes degraded, is this correct? Will/can something like what 3ware does be possible in a sw-raid based configuration or is a HW raid card required?

Presumably all it's doing is writing that sector's contents back from the other drive(s) in the array when the read error is detected, this is something that software could do just as well. Drives only remap bad sectors when they are written over, as a read failure doesn't necessarily mean that the sector is entirely unreadable, but could be due to environmental factors such as high temperature, vibration, etc.

Just rewriting the sector seems a bit questionable though, as if a drive in your array is growing read errors that's not really a good thing..


Justin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/