Re: MD/RAID time out writing superblock

From: Andrei Tanas
Date: Tue Sep 01 2009 - 09:07:23 EST


>>>> The drive might take a longer time like this when doing error handling
>>>> (sector remapping, etc), but then I would expect to see your remapped
>>>> sector count grow.
>>>
>>> Yes, this is a possibility and according to the spec, libata EH should
>>> be retrying flushes a few times before giving up but I'm not sure
>>> whether keeping retrying for several minutes is a good idea either.
>>> Is it?
>> ..
>>
>> Libata will retry only when the FLUSH returns an error,
>> and the next FLUSH will continue after the point where
>> the first attempt failed.
>>
>> But if the drive can still auto-relocate sectors, then the
>> first FLUSH won't actually fail.. it will simply take longer
>> than normal.
>>
>> A couple of those, and we're into the tens of seconds range
>> for time.
>>
>> Still, it would be good to actually produce an error like that
>> to examine under controlled circumstances.
>>
>> Hmm.. I had a drive here that gave symptoms like that.
>> Eventually, I discovered that drive had run out of relocatable
>> sectors, too. Mmm.. I'll see if I can get it back (loaned it out)
>> and perhaps we can recreate this specific scenario on it..
> ..
>
> I checked today, and that drive is no longer available.

Mine errored out again with exactly the same symptoms, this time after only
few days and with the "tunable" set to 2 sec. I got a warranty replacement
but haven't shipped this one yet. Let me know if you want it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/