Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)

From: Christian Schoenebeck
Date: Fri Apr 01 2022 - 10:19:39 EST


On Mittwoch, 30. März 2022 23:47:41 CEST asmadeus@xxxxxxxxxxxxx wrote:
> Thanks Christian!
>
> Christian Schoenebeck wrote on Wed, Mar 30, 2022 at 02:21:16PM +0200:
[...]
> > > Christian Schoenebeck wrote on Sat, Mar 26, 2022 at 01:36:31PM +0100:
> > > hm, fscache code shouldn't be used for cache=mmap, I'm surprised you can
> > > hit this...
> >
> > I assume that you mean that 9p driver does not explicitly ask for fs-cache
> > being used for mmap. I see that 9p uses the kernel's generalized mmap
> > implementation:
> >
> > https://github.com/torvalds/linux/blob/d888c83fcec75194a8a48ccd283953bdba7
> > b2550/fs/9p/vfs_file.c#L481
> >
> > I haven't dived further into this, but the kernel has to use some kind of
> > filesystem cache anyway to provide the mmap functionality, so I guess it
> > makes sense that I got those warning messages from the FS-Cache
> > subsystem?
> It uses the generic mmap which has vfs caching, but definitely not
> fs-cache.
> fs-cache adds more hooks for cachefilesd (writing file contents to disk
> for bigger cache) and things like that cache=loose/mmap shouldn't be
> caring about. cache=loose actually just disables some key parts so I'm
> not surprised it shares bugs with the new code, but cache=mmap is really
> independant and I need to trace where these come from...

From looking at the sources, the call stack for emitting "FS-Cache: Duplicate
cookie detected" error messages with 9p "cache=mmap" option seems to be:

1. v9fs_vfs_lookup [fs/9p/vfs_inode.c, 788]:

inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);

2. v9fs_get_new_inode_from_fid [fs/9p/v9fs.h, 228]:

return v9fs_inode_from_fid_dotl(v9ses, fid, sb, 1);

3. v9fs_inode_from_fid_dotl [fs/9p/vfs_inode_dotl.c, 157]:

inode = v9fs_qid_iget_dotl(sb, &st->qid, fid, st, new);

4. v9fs_qid_iget_dotl [fs/9p/vfs_inode_dotl.c, 133]:

v9fs_cache_inode_get_cookie(inode);
^--- Called independent of function argument "new"'s value here
https://github.com/torvalds/linux/blob/e8b767f5e04097aaedcd6e06e2270f9fe5282696/fs/9p/vfs_inode_dotl.c#L133

5. v9fs_cache_inode_get_cookie [fs/9p/cache.c, 68]:

v9inode->fscache =
fscache_acquire_cookie(v9fs_session_cache(v9ses),
0,
&path, sizeof(path),
&version, sizeof(version),
i_size_read(&v9inode->vfs_inode));

6. fscache_acquire_cookie [include/linux/fscache.h, 251]:

return __fscache_acquire_cookie(volume, advice,
index_key, index_key_len,
aux_data, aux_data_len,
object_size);

7. __fscache_acquire_cookie [fs/fscache/cookie.c, 472]:

if (!fscache_hash_cookie(cookie)) {
fscache_see_cookie(cookie, fscache_cookie_discard);
fscache_free_cookie(cookie);
return NULL;
}

8. fscache_hash_cookie [fs/fscache/cookie.c, 430]:

pr_err("Duplicate cookie detected\n");

> > With QEMU >= 5.2 you should see the following QEMU warning with your
> > reproducer:
> >
> > "
> > qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS
> > export, which might lead to file ID collisions and severe misbehaviours on
> > guest! You should either use a separate export for each device shared from
> > host or use virtfs option 'multidevs=remap'!
> > "
>
> oh, I wasn't aware of the new option. Good job there!
>
> It's the easiest way to reproduce but there are also harder to fix
> collisions, file systems only guarantee unicity for (fsid,inode
> number,version) which is usually bigger than 128 bits (although version
> is often 0), but version isn't exposed to userspace easily...
> What we'd want for unicity is handle from e.g. name_to_handle_at but
> that'd add overhead, wouldn't fit in qid path and not all fs are capable
> of providing one... The 9p protocol just doesn't want bigger handles
> than qid path.

No bigger qid.path on 9p protocol level in future? Why?

> And, err, looking at the qemu code
>
> qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8);
>
> so the qid is treated as "data version",
> but on kernel side we've treated it as inode version (i_version, see
> include/linux/iversion.h)
>
> (v9fs_test_inode_dotl checks the version is the same when comparing two
> inodes) so it will incorrectly identify two identical inodes as
> different.
> That will cause problems...
> Since you'll be faster than me could you try keeping it at 0 there?

I tried with your suggested change on QEMU side:

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index a6d6b3f835..5d9be87758 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -981,7 +981,7 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp)
memcpy(&qidp->path, &stbuf->st_ino, size);
}

- qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8);
+ qidp->version = 0;
qidp->type = 0;
if (S_ISDIR(stbuf->st_mode)) {
qidp->type |= P9_QID_TYPE_DIR;

Unfortunately it did not make any difference for these 2 Linux kernel fs-cache
issues at least; still same errors, and same suboptimal performance.

Best regards,
Christian Schoenebeck