Re: [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem

From: Gao Xiang
Date: Tue Jan 24 2023 - 09:40:53 EST




On 2023/1/24 21:10, Alexander Larsson wrote:
On Tue, 2023-01-24 at 05:24 +0200, Amir Goldstein wrote:
On Mon, Jan 23, 2023 at 7:56 PM Alexander Larsson <alexl@xxxxxxxxxx>

...


No it is not overlayfs, it is overlayfs+squashfs, please stick to
facts.
As Gao wrote, squashfs does not optimize directory lookup.
You can run a test with ext4 for POC as Gao suggested.
I am sure that mkfs.erofs sparse file support can be added if needed.

New measurements follow, they now include also erofs over loopback,
although that isn't strictly fair, because that image is much larger
due to the fact that it didn't store the files sparsely. It also
includes a version where the topmost lower is directly on the backing
xfs (i.e. not via loopback). I attached the scripts used to create the
images and do the profiling in case anyone wants to reproduce.

Here are the results (on x86-64, xfs base fs):

overlayfs + loopback squashfs - uncached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 2.483 s ± 0.029 s [User: 0.167 s, System: 1.656 s]
Range (min … max): 2.427 s … 2.530 s 10 runs
overlayfs + loopback squashfs - cached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 429.2 ms ± 4.6 ms [User: 123.6 ms, System: 295.0 ms]
Range (min … max): 421.2 ms … 435.3 ms 10 runs
overlayfs + loopback ext4 - uncached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 4.332 s ± 0.060 s [User: 0.204 s, System: 3.150 s]
Range (min … max): 4.261 s … 4.442 s 10 runs
overlayfs + loopback ext4 - cached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 528.3 ms ± 4.0 ms [User: 143.4 ms, System: 381.2 ms]
Range (min … max): 521.1 ms … 536.4 ms 10 runs
overlayfs + loopback erofs - uncached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 3.045 s ± 0.127 s [User: 0.198 s, System: 1.129 s]
Range (min … max): 2.926 s … 3.338 s 10 runs
overlayfs + loopback erofs - cached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 516.9 ms ± 5.7 ms [User: 139.4 ms, System: 374.0 ms]
Range (min … max): 503.6 ms … 521.9 ms 10 runs
overlayfs + direct - uncached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 2.562 s ± 0.028 s [User: 0.199 s, System: 1.129 s]
Range (min … max): 2.497 s … 2.585 s 10 runs
overlayfs + direct - cached
Benchmark 1: ls -lR mnt-ovl
Time (mean ± σ): 524.5 ms ± 1.6 ms [User: 148.7 ms, System: 372.2 ms]
Range (min … max): 522.8 ms … 527.8 ms 10 runs
composefs - uncached
Benchmark 1: ls -lR mnt-fs
Time (mean ± σ): 681.4 ms ± 14.1 ms [User: 154.4 ms, System: 369.9 ms]
Range (min … max): 652.5 ms … 703.2 ms 10 runs
composefs - cached
Benchmark 1: ls -lR mnt-fs
Time (mean ± σ): 390.8 ms ± 4.7 ms [User: 144.7 ms, System: 243.7 ms]
Range (min … max): 382.8 ms … 399.1 ms 10 runs

For the uncached case, composefs is still almost four times faster than
the fastest overlay combo (squashfs), and the non-squashfs versions are
strictly slower. For the cached case the difference is less (10%) but
with similar order of performance.

For size comparison, here are the resulting images:

8.6M large.composefs
2.5G large.erofs
200M large.ext4
2.6M large.squashfs
Ok, I have to say I'm a bit surprised by these results. Just a wild guess,
`ls -lR` is a seq-like access, so that compressed data (assumed that you
use it) is benefited from it. I cannot think of a proper cause before
looking into more. EROFS is impacted since EROFS on-disk inodes are not
arranged together with the current mkfs.erofs implemenetation (it's just
a userspace implementation details, if people really care about it, I
will refine the implementation), and I will also implement such sparse
files later so that all on-disk inodes won't be impacted as well (I'm on
vacation, but I will try my best).

From the overall results, I don't really know what's the most bottleneck
point honestly:
maybe just like what you said -- due to overlayfs overhead;
or maybe a bottleneck of loopback device.

so it's much better to show some results of "ls -lR" without overlayfs
stacked too.

IMHO, Amir's main point is always [1]
"w.r.t overlayfs, I am not even sure that anything needs to be modified
in the driver.
overlayfs already supports "metacopy" feature which means that an upper
layer could be composed in a way that the file content would be read
from an arbitrary path in lower fs, e.g. objects/cc/XXX. "

I think there is nothing wrong with it (except for fsverity). From the
results, such functionality indeed can already be achieved by overlayfs
+ some localfs with some user-space adaption. And it was not mentioned
in RFC and v2.

So without fs-verity requirement, currently your proposal is mainly
resolving a performance issue of an exist in-kernel approach (except for
unprivileged mounts). It's much better to describe in the cover letter
-- The original problem, why overlayfs + (localfs or FUSE for metadata)
doesn't meet the requirements. That makes much sense compared with the
current cover letter.

Thanks,
Gao Xiang

[1] https://lore.kernel.org/r/CAOQ4uxh34udueT-+Toef6TmTtyLjFUnSJs=882DH=HxADX8pKw@xxxxxxxxxxxxxx/