Re: [PATCH v2 0/6] Composefs: an opportunistically sharing verified image filesystem

From: Christian Brauner
Date: Tue Jan 17 2023 - 05:12:17 EST


On Tue, Jan 17, 2023 at 09:05:53AM +0200, Amir Goldstein wrote:
> > It seems rather another an incomplete EROFS from several points
> > of view. Also see:
> > https://lore.kernel.org/all/1b192a85-e1da-0925-ef26-178b93d0aa45@xxxxxxxxxxxxx/T/#u
> >
>
> Ironically, ZUFS is one of two new filesystems that were discussed in LSFMM19,
> where the community reactions rhyme with the reactions to composefs.
> The discussion on Incremental FS resembles composefs case even more [1].
> AFAIK, Android is still maintaining Incremental FS out-of-tree.
>
> Alexander and Giuseppe,
>
> I'd like to join Gao is saying that I think it is in the best interest
> of everyone,
> composefs developers and prospect users included,
> if the composefs requirements would drive improvement to existing
> kernel subsystems rather than adding a custom filesystem driver
> that partly duplicates other subsystems.
>
> Especially so, when the modifications to existing components
> (erofs and overlayfs) appear to be relatively minor and the maintainer
> of erofs is receptive to new features and happy to collaborate with you.
>
> w.r.t overlayfs, I am not even sure that anything needs to be modified
> in the driver.
> overlayfs already supports "metacopy" feature which means that an upper layer
> could be composed in a way that the file content would be read from an arbitrary
> path in lower fs, e.g. objects/cc/XXX.
>
> I gave a talk on LPC a few years back about overlayfs and container images [2].
> The emphasis was that overlayfs driver supports many new features, but userland
> tools for building advanced overlayfs images based on those new features are
> nowhere to be found.
>
> I may be wrong, but it looks to me like composefs could potentially
> fill this void,
> without having to modify the overlayfs driver at all, or maybe just a
> little bit.
> Please start a discussion with overlayfs developers about missing driver
> features if you have any.

Surprising that I and others weren't Cced on this given that we had a
meeting with the main developers and a few others where we had said the
same thing. I hadn't followed this.

We have at least 58 filesystems currently in the kernel (and that's a
conservative count just based on going by obvious directories and
ignoring most virtual filesystems).

A non-insignificant portion is probably slowly rotting away with little
fixes coming in, with few users, and not much attention is being paid to
syzkaller reports for them if they show up. I haven't quantified this of
course.

Taking in a new filesystems into kernel in the worst case means that
it's being dumped there once and will slowly become unmaintained. Then
we'll have a few users for the next 20 years and we can't reasonably
deprecate it (Maybe that's another good topic: How should we fade out
filesystems.).

Of course, for most fs developers it probably doesn't matter how many
other filesystems there are in the kernel (aside from maybe competing
for the same users).

But for developers who touch the vfs every new filesystems may increase
the cost of maintaining and reworking existing functionality, or adding
new functionality. Making it more likely to accumulate hacks, adding
workarounds, or flatout being unable to kill off infrastructure that
should reasonably go away. Maybe this is an unfair complaint but just
from experience a new filesystem potentially means one or two weeks to
make a larger vfs change.

I want to stress that I'm not at all saying "no more new fs" but we
should be hesitant before we merge new filesystems into the kernel.

Especially for filesystems that are tailored to special use-cases.
Every few years another filesystem tailored to container use-cases shows
up. And frankly, a good portion of the issues that they are trying to
solve are caused by design choices in userspace.

And I have to say I'm especially NAK-friendly about anything that comes
even close to yet another stacking filesystems or anything that layers
on top of a lower filesystem/mount such as ecryptfs, ksmbd, and
overlayfs. They are hard to get right, with lots of corner cases and
they cause the most headaches when making vfs changes.