Re: aic7xxx sets CDR offline, how to reset?

From: Justin T. Gibbs (gibbs@scsiguy.com)
Date: Wed Sep 04 2002 - 11:50:09 EST


> dledford@redhat.com said:
>> Now, granted, that is more complex than going straight to a BDR, but I
>> have to argue that it *isn't* that complex. It certainly isn't the
>> nightmare you make it sound like ;-)
>
> It's three times longer even in pseudocode...

To make this work, you really need to use the QErr bit in the
disconnect/reconnect page and/or ECA or ACA. QErr I believe is
well supported in devices, but ECA (pre SCSI-3) and ACA most
likely receive very little testing.

I will also voice my opinion (again) that watchdog timer recovery
is in the wrong place in Linux. It belongs in the controller drivers:

1) Only the controller driver knows when to start the timeout
2) Only the controller driver knows the current status of the bus/transport
3) Only the controller can close timeout/just completed races
4) Only the controller driver knows the true transport type
   (SPI/FC/ATA/USB/1394/IP) and what recovery actions are appropriate
   for that transport type given the capabilities of the controller.
5) The algorithm for recovery and maintaining order becomes quite simple:
        1) Freeze the input queue for the controller
        2) Return all transactions unqueued to a device to the mid-layer
        3) Perform the recovery actions required
        4) Unfreeze the controller's queue
        5) Device type driver (sd, cd, tape, etc) decides what errors
           at what rates should cause the failure of a device. The
           controller driver just needs to have the error codes so
           it can honestly and fully report to the type driver what
           really happens so it can make good decissions

   This of course assumes that all transactions have a serial number and
   that requeuing transactions orders them by serial number. With QErr
   set, the device closes the rest if the barrier race for you, but even
   without barriers, full transaction ordering is required if you don't
   want a read to inadvertantly pass a write to the same location during
   recovery.

   For prior art, take a look at FreeBSD. In the worst case, where
   escalation to a bus reset is required, recovery takes 5 seconds.

--
Justin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 22:00:22 EST