Re: [PATCH 2/4] AFS: Add a function to excise a rejected writefrom the pagecache

From: Trond Myklebust
Date: Thu May 24 2007 - 18:35:39 EST


On Thu, 2007-05-24 at 22:35 +0100, David Howells wrote:
> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > Could you please flesh this out a bit, from a higher level?
>
> See the description for patch 3/4.
>
> > As in: what does it mean for a server to "reject" a write? What's actually
> > going on here?
>
> Simply, it means that the server refused to perform the write. The example I'm
> thinking of is that it refused to do so because the ACL governing that file
> changed between the permission() check and the attempt to write the data back.
>
> All it takes is a write-back cache and a tiny little delay between the copy to
> page cache and the write back to the server and there's a window in which
> another client can leap in and make it impossible for the client to write the
> data.
>
> There are other possibilities, for instance:
>
> (1) The key governing the writeback expires.
>
> (2) The user's account is deleted or disabled.
>
> The file being deleted is handled elsewhere. In such a case, everyone's
> writes are just discarded.
>
> > Why does the server do this?
>
> Because it's told to by some other client. Think chmod, or in AFS's case:
>
> fs setacl -dir . -acl system:administrators rlidka
>
> This removes write access by not including a 'w' amongst 'rlidka'.
>
> (Note in AFS, the ACLs attached to a directory control the files in that
> directory).
>
> > I assume that the application will see a short write or an EIO or something?
>
> EACCES or similar most likely. In AFS's case you should get an abort with some
> appropriate error code. This is then mapped to EACCES, EKEYREVOKED,
> EKEYEXPIRED, etc.
>
> > How does this interact with MAP_SHARED?
>
> MAP_SHARED/PROT_WRITE is just another way of doing writes to the pagecache. It
> isn't really special. The way I've done it is to use page_mkwrite() to track
> the key with which a write through a mapping is written back.
>
> > Do we expect that any other networked filesystem would want to do this?
> > (and if not, why not?) Or do they already attempt to do it in some other
> > fashion?
>
> _Any_ network filesystem that (a) has access controls, and (b) does writeback
> caching (be the interval ever so brief) is liable to this issue. It could
> possible for a netfs to avoid with this by getting a guarantee that the write
> will be accepted upfront (CIFS might do this), or by blocking attempts to
> change the access controls through some sort of locking (again CIFS might do
> this).
>
> However, looking at NFS, it appears that NFS may be liable to this problem. It
> does appear to use writeback caching, though it flushes its writes back fairly
> promptly making it quite difficult to test. I talked to Steve Dickson about
> this, and I get the impression that NFS will just sit their and keep retrying
> the writeback.

No. If the write fails, then NFS will mark the mapping as invalid and
attempt to call invalidate_inode_pages2() at the earliest possible
moment.

I'm adding in a patch to defer marking the page as uptodate until the
write is successful in cases where NFS is writing a pristine page.

As for pages that are already marked as uptodate, well you already have
a race: you have deferred the page write, and so other processes may
already have read the rejected data before you tried to write it out.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/