Re: POHMELFS high performance network filesystem. Transactions, failover, performance.

From: Jamie Lokier
Date: Wed May 14 2008 - 10:31:23 EST


Evgeniy Polyakov wrote:
> > For writes, Paxos is actually more or less optimal (in the non-failure
> > cases, at least). Reads are trickier, but there are ways to keep that
> > fast as well. FWIW, Ceph extends basic Paxos with a leasing mechanism to
> > keep reads fast, consistent, and distributed. It's only used for cluster
> > state, though, not file data.
>
> Well, it depends... If we are talking about single node perfromance,
> then any protocol, which requries to wait for authorization (or any
> approach, which waits for acknowledge just after data was sent) is slow.
>
> If we are talking about agregate parallel perfromance, then its basic
> protocol with 2 messages is (probably) optimal, but still I'm not
> convinced, that 2 messages case is a good choise, I want one :)

Look up "one-phase commit" or even "zero-phase commit". (The
terminology is cheating a bit.) As I've understood it, all commit
protocols have a step where each node guarantees it can commit if
asked and node failure at that point does not invalidate the guarantee
if the node recovers (if it can't maintain the guarantee, the node
doesn't recover in a technical sense and a higher level protocol may
reintegrate the node). One/zero-phase commit extends that to
guaranteeing a certain amounts and types of data can be written before
it knows what the data is, so write messages within that window are
sufficient for global commits. Guarantees can be acquired
asynchronously in advance of need, and can have time and other limits.
These guarantees are no different in principle from the 1-bit
guarantee offered by the "can you commit" phase of other commit
protocols, so they aren't as weak as they seem.

Now combine it with a quorum protocol like Paxos, you can commit with
async guarantees from a subset of nodes. Guarantees can be
piggybacked on earlier requests. There, single node write
performance with quorum robustness.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/