Re: silent semantic changes with reiser4

From: Jamie Lokier
Date: Thu Aug 26 2004 - 15:29:29 EST


John Stoffel wrote:
> Jamie> Yes, exactly that. The streams are created on demand of
> Jamie> course, and by userspace helpers when that's appropriate which
> Jamie> I suspect it almost always is.
>
> So how would a program that converts between a JPEG file (with exif
> data) and a PNG work, such as ImageMagick? Are we proposing to teach
> the VFS (or worse yet each filesystem) how to do this?

No. (1) That's what "userspace helpers" are for. (2) What does image
format conversion have to do with viewing files in their component
parts? (3) I suspect someone will write a plugin that does indeed
convert virtually every known image format to a common format
(probably PNG and something "raw") at some point. Why not? It's just
a small script (it would run ImageMagick!).

> I've been following this discussion a bit and I'm not sure that I've
> actually seen any concrete examples of where this is a *good* thing to
> have. People talk about only having to modify 20 bytes at a time
> instead of reading and writing 1mb of data. Isn't that what mmap()
> does?

Sure, if you think mmap() is an easy substitute for "parse the author
& license tags from this [unknown format] audio, video or font file".

> Now I can sorta understand the idea that having a directory look like
> a file is neat, and certainly simplifies some aspects, but I think
> that going all the way down to the logical conclusion here is a bit
> silly.
>
> To use the principle of blowing things up to make them very large or
> very small, what happens if I decide that the best idea is to make all
> files just be directories which contain single byte files? Isn't that
> the logical extreme here? So my 1mb JPEG file is not just some image
> data and header info in multiple files, but it's really just 1
> million (ok 1024 * 1024) individual files that the VFS knows how to
> put together. Seems like the logical extreme. Oh wait, maybe we
> should be exposing a single file per bit instead!

If you decide to do that, you are welcome to write the userspace
helper which creates that view in your directory.

Why not? It's very silly, but it's no sillier then writing a program
to read your 1mb JPEG and write it out one file per bit, which you're
welcome to write right now.

> >From my point of view, there lies madness. As Rik pointed out, how do
> backup and restore tools work with this stuff? Most people could care
> less about how their data is organized, but they certainly care when
> they can't restore it from backups.

Generated views are something which should _not_ be backed up and
restored. That's not what they are for.

Auxiliary metadata, such as author info and signatures which cannot be
stored in the main file for some reason -- that should be backed up.

Permissions, that should be backed up.

But not views which are computed from the main file. You don't need
to back them up, and they don't need to take any real space. They're
virtual, just like an infinitely deep directory on a web server can be
virtual. You can make those very silly too, if you want to.

(Some help from the filesystem with storing temporarily cached values
is fine, for performance, but we shouldn't pretend that generated
views are anything other than virtual).

> I'd really like to see a concrete example from Hans or other
> proponents about why this makes things easier/faster/better to do.
> Mostly, I've just seen "proof by vigorous handwaving" that it's a good
> thing.
>
> In alot of ways, I think people are going in the wrong direction, you
> want to excapsulate and hide the details more, not expose them.
> That's what a good API does, it hides the details while giving you a
> rich set of semantics to manage your data.

Would you like to propose an alternate API? I was under the
impression that we were discussing one, right here in this thread.
It's called POSIX with an extension where you can cd into a file. It
has a rich set of semantics to manage your data, and works with a lot
of familiar programs.

A better one would be welcome, if you have an idea.

> God knows I'm not smart enough or driven enough to actually come up
> with my own ideas, but I certainly haven't seen anyone else (even
> Linus) come up with an earth shattering arguement to say why this is
> the right move to make.

Come up with a better one otherwise it's right ;)

> As Linus says, most of the OS's job is to mediate access to
> objects/data. Why do we want to expose such low level data then?

Eh? In this case the OS is mediating by providing a uniform interface
between programs that access the data and programs that handle file
formats.

What is this "low level" data you speak of? I would say something
like the "designer" of a font, or the "from address" and "subject" of
an email are very high level abstractions, and that's among the sorts
of things which are to be exposed by these interfaces.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/