Re: MD/RAID time out writing superblock

From: Gabor Gombas
Date: Mon Sep 14 2009 - 09:24:58 EST


On Mon, Sep 14, 2009 at 04:41:56PM +0900, Tejun Heo wrote:

> Because this error is actually seen by the md layer and FLUSH in
> general can't be retried cleanly. On retrial, the drive goes on and
> retry the sectors after the point of failure. I'm not sure whether
> FLUSH is actually failing here or it's a communication glitch. At any
> rate, if FLUSH is failing or timing out, the only right thing to do is
> to kick it out of the array as keeping after retrying may lead to
> silent data corruption.

Hmm, how's that supposed to work with TLER on WD enterprise drives?
Isn't the idea behind TLER to prevent drives being kicked out of the
array because the RAID system can have a much more intelligent
retry/recovery logic than a single drive?

AFAIK md RAID can already take advantage of TLER if the operation that's
failing due to TLER is a READ, but I don't know what happens if TLER
kicks in during a WRITE or a FLUSH.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/