Re: [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem

From: Dave Chinner
Date: Tue Jan 24 2023 - 23:18:56 EST


On Tue, Jan 24, 2023 at 09:06:13PM +0200, Amir Goldstein wrote:
> On Tue, Jan 24, 2023 at 3:13 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> > On Tue, 2023-01-24 at 05:24 +0200, Amir Goldstein wrote:
> > > On Mon, Jan 23, 2023 at 7:56 PM Alexander Larsson <alexl@xxxxxxxxxx>
> > > wrote:
> > > > On Fri, 2023-01-20 at 21:44 +0200, Amir Goldstein wrote:
> > > > > On Fri, Jan 20, 2023 at 5:30 PM Alexander Larsson
> > > > > <alexl@xxxxxxxxxx>
> > > > > wrote:
> > I'm not sure why the dentry cache case would be more important?
> > Starting a new container will very often not have cached the image.
> >
> > To me the interesting case is for a new image, but with some existing
> > page cache for the backing files directory. That seems to model staring
> > a new image in an active container host, but its somewhat hard to test
> > that case.
> >
>
> ok, you can argue that faster cold cache ls -lR is important
> for starting new images.
> I think you will be asked to show a real life container use case where
> that benchmark really matters.

I've already described the real world production system bottlenecks
that composefs is designed to overcome in a previous thread.

Please go back an read this:

https://lore.kernel.org/linux-fsdevel/20230118002242.GB937597@xxxxxxxxxxxxxxxxxxx/

Cold cache performance dominates the runtime of short lived
containers as well as high density container hosts being run to
their container level memory limits. `ls -lR` is just a
microbenchmark that demonstrates how much better composefs cold
cache behaviour is than the alternatives being proposed....

This might also help explain why my initial review comments focussed
on getting rid of optional format features, straight lining the
processing, changing the format or search algorithms so more
sequential cacheline accesses occurred resulting in less memory
stalls, etc. i.e. reductions in cold cache lookup overhead will
directly translate into faster container workload spin up.

> > > > This isn't all that strange, as overlayfs does a lot more work for
> > > > each lookup, including multiple name lookups as well as several
> > > > xattr
> > > > lookups, whereas composefs just does a single lookup in a pre-
> > > > computed
> > >
> > > Seriously, "multiple name lookups"?
> > > Overlayfs does exactly one lookup for anything but first level
> > > subdirs
> > > and for sparse files it does the exact same lookup in /objects as
> > > composefs.
> > > Enough with the hand waving please. Stick to hard facts.
> >
> > With the discussed layout, in a stat() call on a regular file,
> > ovl_lookup() will do lookups on both the sparse file and the backing
> > file, whereas cfs_dir_lookup() will just map some page cache pages and
> > do a binary search.
> >
> > Of course if you actually open the file, then cfs_open_file() would do
> > the equivalent lookups in /objects. But that is often not what happens,
> > for example in "ls -l".
> >
> > Additionally, these extra lookups will cause extra memory use, as you
> > need dentries and inodes for the erofs/squashfs inodes in addition to
> > the overlay inodes.
>
> I see. composefs is really very optimized for ls -lR.

No, composefs is optimised for minimal namespace and inode
resolution overhead. 'ls -lR' does a lot of these operations, and
therefore you see the efficiency of the design being directly
exposed....

> Now only need to figure out if real users start a container and do ls -lR
> without reading many files is a real life use case.

I've been using 'ls -lR' and 'find . -ctime 1' to benchmark cold
cache directory iteration and inode lookup performance for roughly
20 years. The benchmarks I run *never* read file data, nor is that
desired - they are pure directory and inode lookup micro-benchmarks
used to analyse VFS and filesystem directory and inode lookup
performance.

I have been presenting such measurements and patches improving
performance of these microbnechmarks to the XFS and fsdevel lists
over 15 years and I have *never* had to justify that what I'm
measuring is a "real world workload" to anyone. Ever.

Complaining about real world relevancy of the presented benchmark
might be considered applying a double standard, wouldn't you agree?

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx