Re: silent semantic changes with reiser4

From: Jamie Lokier
Date: Thu Aug 26 2004 - 10:08:10 EST


Adrian Bunk wrote:
> What is the technical reason why a tar plugin should be reiser4
> specific, instead of a generic VFS or userspace solution that would
> allow the same also on other fs like e.g. iso9660?

It should be a generic VFS plugin, not reiser4 or userspace. The VFS
plugin should call out to userspace for most actions (except handling
cached data), and it should take advantage of special reiser4 features
for storage and performance optimisations. But it should still work
over a standard filesystem, when those special features aren't
available. I guess FUSE and many earlier projects are heading in this
direction.

A generic userspace solution doesn't let you "cd" into a tar file from
all programs like you can inside Midnight Commander.

Gnome and KDE take the approach that every userspace system call
should be intercepted and filtered, to create the illusion of virtual
data. As a result, different programs see different virtual data: you
can't just cut and paste a path from Gnome or KDE into any other
program. It's not just a "social problem of libraries" thing:
sometimes I have programs which don't link to libc. Sometimes I have
programs which mustn't link with anything that calls malloc(). It'd
be silly for them to have a different view of the filesystem just
because they can't link with some userspace library.

The Gnome/KDE/Midnight Commander pure userspace solution is silly: if
_every_ program in the system should get the same view, it makes much
more sense for the kernel to filter the system calls and redirect the
virtual accesses to a userspace daemon, while keeping the real
accesses at full speed.

Furthermore is makes much more sense for the kernel's page cache to
hold those uncompressed pages, than for every userspace application to
try and cooperatively manage a cache of uncompressed fragments in the
most inefficient way.

There's another problem with the Midnight Commander approach. If I
"cd" into a tar file, and then a program writes to the tar file, I
don't always see the changes straight away. The two views aren't
coherent. This isn't an easy problem to solve, but it should be
solved.

When a simple "cd" into .tar.gz or .iso is implemented properly, it
will have _no_ performance penalty after you have first looked in the
file, so long as it remains in the on-disk cache. And, the filesystem
will manage that cache intelligently.

Imagine: for looking at source files and such, you probably won't
bother untarring in future, and you won't bother keeping untarred
source trees in your home directory for easy access to things you look
at often. Why waste the space? You could install whole applications
as a .tar and run them from within it, with no performance penalty.

Similarly, the filesystem will be able to archive directories
automatically that haven't been touched in a long time, with no
visible change except increased storage space. "grep" will be a bit
slower, but you'll have a useful search tool by then (using coherent
indexes) which will be more useful than grep, and much faster.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/