Re: Security issues with local filesystem caching

From: Trond Myklebust
Date: Wed Nov 15 2006 - 09:07:15 EST


On Tue, 2006-11-14 at 19:22 +0000, David Howells wrote:
> Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:
>
> > > Avoiding context switches aren't the main problem; avoiding serialisation
> > > is.
> >
> > Why? It is a backing cache. The only case where serialisation ought to
> > bother you is the case where the client has to invalidate the cache due
> > to a server-side update of the file.
>
> Cache invalidation is not so much of a problem as at that point we know exactly
> where whatever it is that we're invalidating is, and if it's a big object we
> just move it somewhere for the userspace daemon to splat.
>
> The serialisation problem is that if we put cache lookup in its own thread,
> then in effect every open[*] of an NFS file, AFS file, whatever, will be
> serialised.
>
> It almost certainly wouldn't matter if what we did was to asynchronously look
> up the cache cookie for a file. In the common case, an open(O_RDONLY) syscall
> is followed almost immediately by a read(), so there's not much to be gained
> from asynchronising things as the cache cookie has to be available by the time
> we come to process the read, but we can't get the cache cookie before
> completing the server checks made by open, as we need the coherency data before
> attempting to acquire a cookie.

No. All you should need is the result of the lookup(). The coherency
data needs to be checked against an eventual existing CacheFiles entry
during the call to read(), not before.

> The serialisation would stem from having to do several synchronous filesystem
> ops for each cache lookup, but only having one thread in which to do them.
> Okay, I could have several worker threads, but why? Each process attempting to
> access the cache provides me with a suitable worker thread, and then I can have
> as many as there are tasks on the system.

Umm...because the former is a model which actually fits your security
requirements (i.e. one privileged daemon gets lookup()+open()+mkdir()...
rights on the CacheFiles partition), whereas the latter is not (all
tasks need lookup()+open()+mkdir().... privileges)?

> [*] Note that for NFS I've now incorporated a patch from Steve Dickson to
> acquire file cookies on the NFS open() file op, rather than during iget because
> NFS readdir calls iget.
>
> > Once the RPC calls have been launched, the process returns to the VM
> > layer and just waits for the next page to be unlocked. It never returns
> > to the filesystem layer. So where are you using the process context to
> > write out the cached data?
>
> What do you mean by "write out the cached data"? Do you mean write the data to
> the cache?
>
> If so, that'd be nfs_readpage_to_fscache() as called from nfs_readpage_sync()
> or nfs_readpage_release().

nfs_readpage_release() is an rpciod context, _not_ a user thread
context.

> That calls fscache_write_page() which calls cachefiles_write_page() which calls
> generic_file_buffered_write_one_kernel_page().
>
> That last copies the data into the pagecache attached to an ext3 inode to be
> written out (hopefully) asynchronously.
>
> However, that may do other disk accesses, I suppose, as it calls
> prepare_write() and commit_write() on ext3.

Which would generally be forbidden under the rpciod context, BTW, since
they imply calls to generic memory allocation (== nasty tricksy deadlock
potential, since rpciod may be called upon to help write out NFS pages
via shrink_page_list and friends).

> I could try and make it asynchronous, but that means more overhead in other
> ways:-( I presume this will then sometimes be running in rpciod context?
>
> > The cookie lookups need to be synchronous, but why would the file
> > creation need to be synchronous? Creating the cachefs file and waiting
> > on that to complete etc are all utterly useless activities as far as
> > satisfying the user request for data goes. Just start the process of
> > creating a backing file, and then get on with the actual syscall.
>
> vfs_mkdir() is synchronous. vfs_create() is synchronous. vfs_[sg]etxattr is
> synchronous. Lookup is synchronous.

All of them are synchronous as far as accessing the remote filesystem is
concerned. Why would the user process care if a privileged daemon has
completed the shadow mkdir() or create() on the CacheFiles system or
not?

> Yes, I could make them all asynchronous, but it'd be a lot more work, and
> mostly unnecessary, and I'd probably have to fight down lots of objections.
>
>
> Remember: in the common case, open(O_RDONLY) is going to be followed quickly by
> a read(). I suppose there may be an intervening stat() and malloc(), but even
> so...

Which is why lookup() + open() + read() needs to be fast in the case
where you have a CacheFiles hit. It does not justify mkdir, create, etc
being fast, nor does it justify the open() + read() part needing to be
fast in the case of a CacheFiles miss.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/