Re: Linux Cluster using shared scsi

From: Eric Z. Ayers (Eric.Ayers@intec-telecom-systems.com)
Date: Thu May 03 2001 - 14:57:41 EST


Pavel Machek writes:
> Hi!
>
> > > > ...
> > > >
> > > > If told to hold a reservation, then resend your reservation request once every
> > > > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a
> > > > deal as requesting a reservation every 2 seconds might sound). The first time
> > > > the reservation is refused, consider the reservation stolen by another machine
> > > > and exit (or optionally, reboot).
> > >
> > > Umm. Reboot? What do you think this is? Windoze?
> >
> > It's the *only* way to guarantee that the drive is never touched by more than
> > one machine at a time (notice, I've not been talking about a shared use drive,
> > only one machine in the cluster "owns" the drive at a time, and it isn't for
> > single transactions that it owns the drive, it owns the drive for as long as
> > it is alive, this is a limitation of the filesystes currently available in
> > mainstream kernels). The reservation conflict and subsequent reboot also
> > *only* happens when a reservation has been forcefully stolen from a
> >machine.
>
> I do not believe reboot from kernel is right approach. Tell init with
> special signal, maybe; but do not reboot forcefully. This is policy;
> doing reboot might be right answer in 90% cases; that does not mean
> you should do it always.
...

However distateful it sounds, there is precedent for the
behavior that Doug is proposing in commercial clustering
implementations. My recollection is that both Compaq TruCluster and
HP Service Guard have logic that will panic the kernel when a disk is
"stolen" from under a running service and there is a "network
partition" in the cluster.

A network partition occurs when multiple machines in the cluster are
runnig, but the HA software agents on two nodes can't communicate via
the network to arbitrate which node should be the owner of the disk.

-Eric

--
Eric Z. Ayers				              Lead Software Engineer
Phone:  +1 404-705-2864                    Computer Generation, Incorporated
Fax:    +1 404-705-2805                     an Intec Telecom Systems Company
Web:    http://www.intec-telecom-systems.com/
Email:  eric.ayers@intec-telecom-systems.com
Postal: Bldg G 4th Floor, 5775 Peachtree-Dunwoody Rd, Atlanta, GA 30342 USA
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 07 2001 - 21:00:18 EST