Re: Alternate solutions

Kai Henningsen (kai@khms.westfalen.de)
29 Jul 1996 21:53:00 +0200


ecki@inka.de (Bernd Eckenfels) wrote on 29.07.96 in <4th8v6$hgo@nz12.rz.uni-karlsruhe.de>:

> Kai Henningsen <kai@khms.westfalen.de> wrote:
> : It does? Can you substantiate that claim? It certainly looks wrong to me.
>
> Thinks like 3phase commits or other shemes only reduces the probability
> something goes wrong, it is not possible to ensure it. Acks or change
> requests can get lost, links can fail. Timeouts can't ensure that your data
> will be the same on all systems all the time.

3 phase commits are about scenarios with rollbacks. I don't see that we
need that with a networked file system.

In fact, all these problems seem to be about coordinating several parallel
changes that are a logical unit - that is, you must have either all or
none, or you lose. This doesn't look at all like the type of situation you
have with a distributed file system.

Cache coherency is definitely possible. I wouldn't go so far as to call it
trivial, but it sure doesn't look overly complicated to me.

Yes, links can fail. However, since you don't have a distributed database,
only distributed _access_, a failing link means two things:

* Your application breaks, as it can no longer access its data
* If you had some data cached for writing, the update will be lost

Actually, the exact same thing happens when you switch off the machine -
well, in the failing link case, you _might_ get a notification.

We do need mechanisms to handle failure, as there is no way to keep
failure from happening, but we do not need transaction handling.

In such a scenario, cache coherency might, for example, work like this:

* Possible scenarios:
1 One client has read-write access to a piece of cache.
No client has read-only access.
2 One or more clients have read-only access to a piece of cache.
No client has read-write access.
3 No client has access to a piece of cache.

(You might want to handle the server's local fs underlying the net fs as
another client here.)

Clients can always simply give up their access, in the case of write
access optionally offering a new content for the cache.

Clients that need increased access can ask the server, and the server can
ask a client to give up or reduce its access.

When a client fails to do so in reasonable time, we have a failed link.
(At this point, we might handle this like a crashed application - close
the file access, dropping any related locks.) On the next contact, the
client will be told that his copy of the cache has gone stale, and his
access is gone. The client might need to do something about what is
essentially a write I/O error, the same as if the server went down.

If we need to do something synchroneously, we can always wait until the
server tells us that it has successfully received the data.

In short, I see nothing here to lose sleep about.

MfG Kai