Re: [PATCH] AFS filesystem for Linux (2/2)

From: David Howells (dhowells@cambridge.redhat.com)
Date: Mon Oct 07 2002 - 04:14:19 EST


> Last time I checked, yes. Callbacks should be sent only when the file is
> closed.

Sent by whom?

The client doesn't send callbacks. The server breaks all the callbacks on a
file whenever the contents or metadata of that file are changed. Closing has
nothing to do with it as far as the server is concerned (indeed, there is no
close).

> But local modification are per definition more up-to-date because you
> haven't closed the file yet. It is very much a problem of how you do your
> caching. I'm trying to tell you AFS has a well defined consistency model
> called session semantics. That directly influenzes every decision you make
> wrt. to caching and everything I've seen up until now shows me that you're
> simply implementing an NFS client that happens to speak to AFS servers.

I'm trying to follow the AFS architecture guide, without sacrificing too much
to cache coherency problems - which it sounds like Coda has... You've said it
requires user intervention (automated maybe) to fix up files for which the
cache has become inconsistent with respect to the server.

The difference is between Write-Back and Write-Through caching (to use h/w CPU
caching terminology). What I'm looking at doing is using Write-Through
caching, but I could make it configurable.

As I've said, Write-Back caching has more tricky security issues, particularly
when more than one user is changing a file on a client, given that the file
can't be "opened" with the server.

> Actually AFS is most definitely a stateful filesystem and as such has
> the concept of an open file.

Check the AFS fileserver API. There is no open call (and no close call). You
_can't_ "open" the file on the server. All you can do is fetch/store data or
metadata which gets you a short-term guarantee that you'll be told if what you
have is invalidated. (Admittedly, there are also calls for creating/destroying
various sorts of files, but they aren't pertinent to this discussion).

Thus, every time you access the file, your credentials have to be effectively
rechecked (the fileserver may cache granted permissions, but that's an
optimisation).

Furthermore, whilst the fileserver will inform you if the mode bits on a vnode
are changed, and probably when the ACL is changed on a directory, it won't
inform you when group ACEs of that ACL are changed in the protection
server. Therefore, the access rights granted to any particular set of
credentials may change without a client knowing about it - even if it still
holds a valid unbroken lease.

My point is that it can't be assumed that security details negotiated at
client-side open will still be valid come the first operation on that file.

> First of all, typical write-write sharing is extremely low, otherwise
> Coda wouldn't have even worked with it's optimistic replication and
> caching. Second of all permission changes are even rarer.

Rare isn't the same as impossible. If it can happen, it has to be considered
and has to be catered for.

> - We either succeed in writing the file when the person who lost access
> closed before the last writer who still had access. Big deal, he had
> access when he opened the file, and it is next to impossible to tell
> wheter his close arrived before or after the permission was taken away
> except when every operation is somehow given an absolute linear
> ordering in time. Generally too expensive on a distributed system
> with clients that may come and go as they please.

So someone else submits the now denied user's changes, thus allowing them to
bypass security... I'm not arguing that you're necessarily wrong. Some
compromise has to be made, and without being able to account to the server for
every one who has made a change to the data being submitted, and perform full
database style rollback and re-execute, the options are limited.

> - Or the person who lost access is the last one to close the file and
> trigger the writeback. In this case the servers will deny the write
> operation and the client will declare a conflict on that object and
> switch to disconnected mode. Someone who has write permission can
> repair the conflict.

This is not an option I can use.

> Actually, for any unmodified data there is only one copy. Once it is
> modified there are two (the dirty page and the logged change), why would
> there be a third, I don't care about the copy in the backing file...

You listed three (not including RAM): swap, logfile, underlying file.

> It has the advantage that even if you pull the powercord, we can still
> recover the last consistent set of metadata and from that validate both the
> data in the cache files, and revalidate everything against the servers,
> which we already need for the disconnected operation.

Does your cache run in synchronous mode with respect to the disc?

It may be that I can improve matters in my caching by using a journal... and
the VM/VFS has hooks for use by journalling filesystems such as EXT3.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 07 2002 - 22:00:57 EST