Re: [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem

From: Gao Xiang
Date: Fri Jan 27 2023 - 05:25:06 EST




On 2023/1/25 18:15, Alexander Larsson wrote:
On Wed, 2023-01-25 at 18:05 +0800, Gao Xiang wrote:


On 2023/1/25 17:37, Alexander Larsson wrote:
On Tue, 2023-01-24 at 21:06 +0200, Amir Goldstein wrote:
On Tue, Jan 24, 2023 at 3:13 PM Alexander Larsson
<alexl@xxxxxxxxxx>

...


They are all strictly worse than squashfs in the above testing.


It's interesting to know why and if an optimized mkfs.erofs
mkfs.ext4 would have done any improvement.

Even the non-loopback mounted (direct xfs backed) version performed
worse than the squashfs one. I'm sure a erofs with sparse files
would
do better due to a more compact file, but I don't really see how it
would perform significantly different than the squashfs code. Yes,
squashfs lookup is linear in directory length, while erofs is
log(n),
but the directories are not so huge that this would dominate the
runtime.

To get an estimate of this I made a broken version of the erofs
image,
where the metacopy files are actually 0 byte size rather than
sparse.
This made the erofs file 18M instead, and gained 10% in the cold
cache
case. This, while good, is not near enough to matter compared to
the
others.

I don't think the base performance here is really much dependent on
the
backing filesystem. An ls -lR workload is just a measurement of the
actual (i.e. non-dcache) performance of the filesystem
implementation
of lookup and iterate, and overlayfs just has more work to do here,
especially in terms of the amount of i/o needed.

I will form a formal mkfs.erofs version in one or two days since
we're
cerebrating Lunar New year now.

I've made a version and did some test, it can be fetched from:
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental

this feature can be used with -Ededupe or --chunksize=# (assuming
that all sparse files are holed, so that each file will only has
one chunk.)


Since you don't have more I/O traces for analysis, I have to do
another
wild guess.

Could you help benchmark your v2 too? I'm not sure if such
performance also exists in v2.  The reason why I guess as this is
that it seems that you read all dir inode pages when doing the first
lookup, it can benefit to seq dir access.

I'm not sure if EROFS can make a similar number by doing forcing
readahead on dirs to read all dir data at once as well.

Apart from that I don't see significant difference, at least
personally
I'd like to know where it could have such huge difference.  I don't
think that is all because of read-only on-disk format differnce.

I think the performance difference between v2 and v3 would be rather
minor in this case, because I don't think a lot of the directories are
large enough to be split in chunks. I also don't believe erofs and
composefs should fundamentally differ much in performance here, given
that both use a compact binary searchable layout for dirents. However,
the full comparison is "composefs" vs "overlayfs + erofs", and in that
case composefs wins.

I'm still on vacation.. I will play with composefs personally to get
more insights when I'm back, but it would be much better to provide
some datasets for this as well (assuming the dataset can be shown in
public.)

Thanks,
Gao Xiang