Re: [PATCH 1b/7] dlm: core locking

From: Steven Dake
Date: Tue Apr 26 2005 - 12:47:35 EST

On Mon, 2005-04-25 at 22:49, David Teigland wrote:
> On Mon, Apr 25, 2005 at 02:54:58PM -0700, Steven Dake wrote:
> > On Mon, 2005-04-25 at 09:58, David Teigland wrote:
> > > The core dlm functions. Processes dlm_lock() and dlm_unlock() requests.
> > > Creates lockspaces which give applications separate contexts/namespaces in
> > > which to do their locking. Manages locks on resources' grant/convert/wait
> > > queues. Sends and receives high level locking operations between nodes.
> > > Delivers completion and blocking callbacks (ast's) to lock holders.
> > > Manages the distributed directory that tracks the current master node for
> > > each resource.
> > >
> >
> > David
> >
> > Very positive there are some submissions relating to cluster kernel work
> > for lkml to review.. good job..
> >
> > I have some questions on the implementation:
> >
> > It appears as though a particular processor is identified as the "lock
> > master" or processor that maintains the state of the lock. So for
> > example, if a processor wants to acquire a lock, it sends a reqeust to
> > the lock master which either grants or rejects the request for the
> > lock. What happens in the scenario that a lock master leaves the
> > current configuration? This scneario is very likely in practice.
> Of course, every time a node fails.
> > How do you synchronize the membership events that occur with the kernel
> > to kernel communication that takes place using SCTP?
> SCTP isn't much different than TCP, so I'm not sure how that's relevant.
> It's used primarily so we can take advantage of multi-homing when you have
> redundant networks.
> When the membership of a lockspace needs to change, whether adding or
> removing a node, activity is suspended in that lockspace on all the nodes
> using it. After all are suspended, the lockspace is then told (on all
> lockspace members) what the new membership is. Recovery then takes place:
> new masters are selected and waiting requests redirected.
> > It appears from your patches there is some external (userland)
> > application that maintains the current list of processors that qualify
> > as "lock servers".
> correct
> > Is there then a dependence on external membership algorithms?
> We simply require that the membership system is in agreement before the
> lockspace is told what the new members are. The membership system
> ultimately drives the lockspace membership and we can't have the
> membership system on different nodes telling the dlm different stories
> about who's in/out.
> So, yes, the membership system ultimately needs to follow some algorithm
> that guarantees agreement. There are rigorous, distributed ways of doing
> that (your evs work which I look forward to using), and simpler methods,
> e.g. driving it from some single point of control.
> > What user application today works to configure the dlm services in the
> > posted patch?
> I've been using the command line program "dlm_tool" where I act as the
> membership system myself. We're just putting together pieces that will
> drive this from a membership system (like openais). Again, the pieces you
> decide to use in userspace are flexible and depend on how you want to use
> the dlm.
> > With usage of SCTP protocol, there is now some idea of moving the
> > protocol for cluster communication into the kernel and using SCTP as
> > that protocol...
> Neither SCTP nor the dlm are about cluster communication, they're both
> about simple point-to-point messages. When you move up to userspace and
> start talking about membership, then the issue of group communication
> models comes up and your openais/evs work is very relevant. Might you be
> misled about what SCTP does?

Hate to admit ignorance, but I'm not really sure what SCTP does.. I
guess point to point communication like tcp but with some other kind of
characteristics.. I wanted to have some idea of how locking messages
are related to the current membership. I think I understand the system
from your descriptions and reading the code. One scenario I could see
happeing is that there are 2 processors A, B.

B drops out of membership
A sends lock to lock master B (but A doens't know B has dropped out of
membership yet)
B gets lock request, but has dropped out of membership or failed in some

In this case the order of lock messages with the membership changes is
important. This is the essential race that describes almost every issue
with distributed systems... virtual synchrony makes this scenario
impossible by ensuring that messages are ordered in relationship to
membership changes.

Do you intend to eventually move the point to point communication into
userspace, or keep it within kernel? I'd like to understand if there is
a general need for cluster communication as a kernel service, or the
intent is for all communication to be done in userspace...

You guys have done a good job here...

Can I ask a performance question.. How many locks per second can be
acquired and then released with a system of 2-3 processors? In the case
that the processor requesting the lock is on the lock server processor,
and in the case that the processor requesting the lock is not on the
lock server processor... (processor in this case is the system that
processes the lock operations). Assuming all locks are uncontended...


> Dave

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at