Re: [RFC]transient transport error report for LLD timeout

From: James Bottomley
Date: Mon Sep 27 2004 - 09:08:17 EST

Next message: Andrea Arcangeli: "Re: mlock(1)"
Previous message: Micha Feigin: "Re: [BUG: 2.6.9-rc2-bk11] input completely dead in X"
In reply to: Rik van Riel: "Re: [RFC]transient transport error report for LLD timeout"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2004-09-23 at 19:32, Masao Fukuchi wrote:
> Therefore, I newly prepared timer in block layer.
> When it detects timeout, it responds to upper(RAID/multipath) layer and
> upper layer begins retry operation using alt-disk/alt-path.
> Resource using in block and SCSI(LLD) layer is freed when it receives
> response from LLD(SCSI) layer.

I really don't think this is the correct thing to do. The block layer
has no idea what's actually happened to the command, but it's going to
complete it anyway. Depending on what goes on above, this may unpin the
user pages that the scsi_cmnd is still using. Also, the two timers
(scsi command timer and this new block layer timer are critically
coupled. Tune them wrongly and nasty things will happen).

The right thing to do is to see the no retry flag in error recovery and
complete the command when we see the host has relinquished it. Then go
on to do transport recovery with a new command.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrea Arcangeli: "Re: mlock(1)"
Previous message: Micha Feigin: "Re: [BUG: 2.6.9-rc2-bk11] input completely dead in X"
In reply to: Rik van Riel: "Re: [RFC]transient transport error report for LLD timeout"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]