Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)

From: Christian Schoenebeck
Date: Sat Apr 09 2022 - 07:16:35 EST


On Mittwoch, 30. März 2022 14:21:16 CEST Christian Schoenebeck wrote:
> I made some tests & benchmarks regarding the fs-cache issue of 9p, running
> different kernel versions and kernel configurations in comparison.
[...]
> Case Linux kernel version .config msize cache duration host cpu errors/warnings
>
> A) 5.17.0+[2] + msize patches[1] debug 4186112 mmap 20m 40s ~80% none
> B) 5.17.0+[2] + msize patches[1] debug 4186112 loose 31m 28s ~35% several errors (compilation completed)
> C) 5.17.0+[2] + msize patches[1] debug 507904 mmap 20m 25s ~84% none
> D) 5.17.0+[2] + msize patches[1] debug 507904 loose 31m 2s ~33% several errors (compilation completed)
> E) 5.17.0+[2] debug 512000 mmap 23m 45s ~75% none
> F) 5.17.0+[2] debug 512000 loose 32m 6s ~31% several errors (compilation completed)
> G) 5.17.0+[2] release 512000 mmap 23m 18s ~76% none
> H) 5.17.0+[2] release 512000 loose 32m 33s ~31% several errors (compilation completed)
> I) 5.17.0+[2] + msize patches[1] release 4186112 mmap 20m 30s ~83% none
> J) 5.17.0+[2] + msize patches[1] release 4186112 loose 31m 21s ~31% several errors (compilation completed)
> K) 5.10.84 release 512000 mmap 39m 20s ~80% none
> L) 5.10.84 release 512000 loose 13m 40s ~55% none
[...]
> About the errors: I actually already see errors with cache=loose and recent
> kernel version just when booting the guest OS. For these tests I chose some
> sources which allowed me to complete the build to capture some benchmark as
> well, I got some "soft" errors with those, but the build completed at least.
> I had other sources OTOH which did not complete though and aborted with
> certain invalid file descriptor errors, which I obviously could not use for
> those benchmarks here.

I used git-bisect to identify the commit that broke 9p behaviour, and it is
indeed this one:

commit eb497943fa215897f2f60fd28aa6fe52da27ca6c (HEAD, refs/bisect/bad)
Author: David Howells <dhowells@xxxxxxxxxx>
Date: Tue Nov 2 08:29:55 2021 +0000

9p: Convert to using the netfs helper lib to do reads and caching

Convert the 9p filesystem to use the netfs helper lib to handle readpage,
readahead and write_begin, converting those into a common issue_op for the
filesystem itself to handle. The netfs helper lib also handles reading
from fscache if a cache is available, and interleaving reads from both
sources.

This change also switches from the old fscache I/O API to the new one,
meaning that fscache no longer keeps track of netfs pages and instead does
async DIO between the backing files and the 9p file pagecache. As a part
of this change, the handling of PG_fscache changes. It now just means that
the cache has a write I/O operation in progress on a page (PG_locked
is used for a read I/O op).

Note that this is a cut-down version of the fscache rewrite and does not
change any of the cookie and cache coherency handling.

Changes
=======
ver #4:
- Rebase on top of folios.
- Don't use wait_on_page_bit_killable().

ver #3:
- v9fs_req_issue_op() needs to terminate the subrequest.
- v9fs_write_end() needs to call SetPageUptodate() a bit more often.
- It's not CONFIG_{AFS,V9FS}_FSCACHE[1]
- v9fs_init_rreq() should take a ref on the p9_fid and the cleanup should
drop it [from Dominique Martinet].

Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
Reviewed-and-tested-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
cc: v9fs-developer@xxxxxxxxxxxxxxxxxxxxx
cc: linux-cachefs@xxxxxxxxxx
Link: https://lore.kernel.org/r/YUm+xucHxED+1MJp@xxxxxxxxxxxxx/ [1]
Link: https://lore.kernel.org/r/163162772646.438332.16323773205855053535.stgit@xxxxxxxxxxxxxxxxxxxxxx/ # rfc
Link: https://lore.kernel.org/r/163189109885.2509237.7153668924503399173.stgit@xxxxxxxxxxxxxxxxxxxxxx/ # rfc v2
Link: https://lore.kernel.org/r/163363943896.1980952.1226527304649419689.stgit@xxxxxxxxxxxxxxxxxxxxxx/ # v3
Link: https://lore.kernel.org/r/163551662876.1877519.14706391695553204156.stgit@xxxxxxxxxxxxxxxxxxxxxx/ # v4
Link: https://lore.kernel.org/r/163584179557.4023316.11089762304657644342.stgit@xxxxxxxxxxxxxxxxxxxxxx # rebase on folio
Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>

So Linux kernel v5.15 is fine, v5.16 is broken.

Best regards,
Christian Schoenebeck