RE: PATCH: Further aacraid work

From: Byron Stanoszek
Date: Tue Jun 29 2004 - 14:08:36 EST


On Tue, 29 Jun 2004, Salyzyn, Mark wrote:

I believe this nails the problem too.

However, there is a corner case condition lurking on this (See my
currently unanswered email "error recovery and command completion" on
linux-scsi) where I try to deal with completing a command while error
recovery is triggered. Scsi_done will return doing *nothing* effectively
loosing the command completion.

MarkH, I had talked to you about he addition of the scsi_add_timer
before calling scsi_done to address this condition. I do not believe
this to be the (Reliable and/or performance oriented) solution.

Sincerely -- Mark Salyzyn

I've tested out both patches sent to me.

Test 1: aacraid-1.1.5-2245.tgz

Works flawlessly and speedily! The rsync completes, and doing a sync() (as
called during a normal lilo update) takes roughly 1 second as opposed to 20
with the original aacraid patch from Alan Cox. Also, no SCSI hang message ever
appears.

Test 2: Mark Haverkamp's linit.c patch

The "SCSI hang" console message appears just as before during the 'rsync',
however (unlike before) the device is still usable for roughly 30 seconds after
the problem. During these 30 seconds, the 'rsync' process is hung, but I can
still do a 'df', 'ls', and so on. After 30 seconds, the entire /dev/sda locks
up and I have no choice but to reboot the system.

-Byron

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/