Re: [PATCH 1b/7] dlm: core locking

From: Lars Marowsky-Bree
Date: Thu Apr 28 2005 - 08:01:28 EST

On 2005-04-27T22:26:38, David Teigland <teigland@xxxxxxxxxx> wrote:

> > So in effect, the delivery of the suspend/membership distribution/resume
> > events are three cluster-wide barriers?
> >
> > I can see how that simplifies the recovery algorithm.
> Correct. I actually consider it two external barriers: the first after
> the lockspace has been suspended, the second after lockspace recovery is
> completed.

Hmmm. This is actually slightly different from what I thought you were

Actually, is that several phase step really necessary? Given that
failures can occur at any given point even then - during each barrier
and in-between - couldn't we just as well deliver the membership event
directly, and proceed with recovery as if that was the "final" state?
We always have to deal with nodes failing, rejoining etc at any given
step and eventually restarting the algorithm if needed.

(Well, a node joining you could serialize and only do that after you
have completed the recovery steps for the event before. But a failing
node during recovery seems to imply the need to restart the algorithm
anyway, and that's just what would happen if a new membership event was

Does it really simplify the recovery, or does it just obscure the
complexity, ie, snake oil?

That said, the model can be mapped as-is quite directly to how the
heartbeat 2.x handles resources which can be active more than once. The
first barrier would be the "we lost a node and are about to fence one of
your incarnations" (or "a node joined and we're about to start one"),
and the second one would be the "we fenced node X" or "we started you on
node X".

However, there's one property here: We assume that those notifications
_can never fail_; they are delivered (and guaranteed to be before we
commence the main operation), and that's it. Can a node in your model
choose to reject the suspend/resume operation?

> > And, I assume that the delivery of a "node down" membership event
> > implies that said node also has been fenced.
> Typically it does if you're combining the dlm with something that requires
> fencing (like a file system). Fencing isn't relevant to the dlm itself,
> though, since the dlm software isn't touching any storage.

Ack. Good point, I was thinking too much in terms of GFS/OCFS2 here ;-)

> > Just some food for thought how this all fits together rather neatly.
> Interesting, and sounds correct. I must admit that using the word "lock"
> to describe these CRM-level inter-dependent objects is new to me.

It's locks with dependencies, instead of one "lock" per resource group.
That's been mulling on the back of my mind ever since Stephen gave me
the DLM-centric-clustering-world talking to at Linux Kongress 98, I
think ;-) By now I think the model fits.

Lars Marowsky-Brée <lmb@xxxxxxx>

High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at