Re: High-availability question

Gerard Roudier (groudier@club-internet.fr)
Tue, 27 Jul 1999 20:21:59 +0200 (MET DST)


On Mon, 26 Jul 1999, Stephen C. Tweedie wrote:

> Hi,
>
> On Fri, 23 Jul 1999 13:32:37 +0200, Kurt Garloff <garloff@suse.de> said:
>
> >> Try this with most SCSI card/driver combinations and you will have
> >> trouble. The bus reset performed by the second SCSI card to start up
> >> will screw up the operation of the first SCSI card.
>
> > I can not confirm the problem with SCSI resets.
> > Basically, a driver has to detect, when a SCSI reset occurs.
>
> Yep. I believe the Buslogic drivers have been fine in this respect ro
> ages, and afaik the Adaptec ones should also work OK. What is missing
> is systematic testing of these configurations, of course.

What do you mean here? Is it that most other SIMs are unaware of SCSI BUS
resets they donnot initiate (raised by another devices on the BUS) ?

I suspected guys that work on the driver for the CPU and MEMORY to have
much to learn about IOs, but I didn't think I was so right.

> > Ii has to tell the mid-layer that all commands already sent were aborted
> > (DID_RESET) and reset it's negotiation params (Sync, Wide). That's all,
> > basically.
>
> In principle it's not really different fromt he retry work we already
> have to do on a host-originated bus reset.

The way the linux scsi layer resets and retries commands if full stupid.
The fact that the device drivers (or applications for raw IOs) are not
notified of RESETS when this condition cannot be reported by a command
that fails does not allow any reasonnable recovery.

The recovery is to be performed by the application client (basically the
device driver) that must have some handle on events that actually happen
and involve the device. So, all the recovery stuff that is done by the
scsi driver is exactly the wrong thing to do regarding recovery.

The mid-layer must be some kind of transparent layer when recovery is to
be performed. Doing different is (has been) a serious design bug.

> > I see another problem: The box which mounts the FS read-only will have
> > inconcsistent buffers and you'll probably see lots of fs warnings.
>
> Yes. Unless you are using raw IO with some synchronisation protocol
> running between the two hosts, you need to make sure that only one host
> has the device mounted at once.

Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/