Re: [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem

From: Christian Brauner
Date: Wed Jan 25 2023 - 10:25:01 EST


On Wed, Jan 25, 2023 at 02:46:59PM +0200, Amir Goldstein wrote:
> > >
> > > Based on Alexander's explanation about the differences between overlayfs
> > > lookup vs. composefs lookup of a regular "metacopy" file, I just need to
> > > point out that the same optimization (lazy lookup of the lower data
> > > file on open)
> > > can be done in overlayfs as well.
> > > (*) currently, overlayfs needs to lookup the lower file also for st_blocks.
> > >
> > > I am not saying that it should be done or that Miklos will agree to make
> > > this change in overlayfs, but that seems to be the major difference.
> > > getxattr may have some extra cost depending on in-inode xattr format
> > > of erofs, but specifically, the metacopy getxattr can be avoided if this
> > > is a special overlayfs RO mount that is marked as EVERYTHING IS
> > > METACOPY.
> > >
> > > I don't expect you guys to now try to hack overlayfs and explore
> > > this path to completion.
> > > My expectation is that this information will be clearly visible to anyone
> > > reviewing future submission, e.g.:
> > >
> > > - This is the comparison we ran...
> > > - This is the reason that composefs gives better results...
> > > - It MAY be possible to optimize erofs/overlayfs to get to similar results,
> > > but we did not try to do that
> > >
> > > It is especially important IMO to get the ACK of both Gao and Miklos
> > > on your analysis, because remember than when this thread started,
> > > you did not know about the metacopy option and your main argument
> > > was saving the time it takes to create the overlayfs layer files in the
> > > filesystem, because you were missing some technical background on overlayfs.
> >
> > we knew about metacopy, which we already use in our tools to create
> > mapped image copies when idmapped mounts are not available, and also
> > knew about the other new features in overlayfs. For example, the
> > "volatile" feature which was mentioned in your
> > Overlayfs-containers-lpc-2020 talk, was only submitted upstream after
> > begging Miklos and Vivek for months. I had a PoC that I used and tested
> > locally and asked for their help to get it integrated at the file
> > system layer, using seccomp for the same purpose would have been more
> > complex and prone to errors when dealing with external bind mounts
> > containing persistent data.
> >
> > The only missing bit, at least from my side, was to consider an image
> > that contains only overlay metadata as something we could distribute.
> >
>
> I'm glad that I was able to point this out to you, because now the comparison
> between the overlayfs and composefs options is more fair.
>
> > I previously mentioned my wish of using it from a user namespace, the
> > goal seems more challenging with EROFS or any other block devices. I
> > don't know about the difficulty of getting overlay metacopy working in a
> > user namespace, even though it would be helpful for other use cases as
> > well.
>

If you decide to try and make this work with overlayfs I can to cut out
time and help with both review and patches. Because I can see this being
beneficial for use-cases we have with systemd as well and actually being
used by us as we do make heavy use of overlayfs already and probably
will do even more so in the future on top of erofs.

(As a sidenote, in the future, idmapped mounts can be made useable from
userns and there's a todo and ideas for this on
https://uapi-group.org/kernel-features.

Additionally, I want users to have the ability to use them without any
userns in the mix at all. Not just because there are legitimate users
that don't need to allocate a userns at all but also because then we can
do stuff like map down a range of ids to a single id (what probably nfs
would call "squashing") and other stuff.)

Christian