Re: multiply files in one (was GNU/Linux stance by Richard Stallman)

Larry McVoy (lm@bitmover.com)
Tue, 30 Mar 1999 00:09:26 -0800


: What I was getting at
: was that if you pack the data together, then read-ahead will yield
: more complete files, which translates to less transactions in the end.

Read ahead really doesn't solve the problem for two reasons:

1) the allocation policies in almost all file systems is file
centric - it's careful to get *a* file contiguous but isn't
careful to get multiple files in the same directory all next
to each other

2) even if (1) was solved, the file system needs to know that it
bring in more data. If the file it is reading is 1K long,
why should it brin in the next 5MB of data?

: I don't care about saving disc space in terms of fitting all your data
: onto the disc: disc is cheap. If you run out of room, buy some more.

Amusing to see you say that after all of your arguments about not
upgrading those 386's, but whatever, I agree. In general, disk is
extremely cheap and time is not. (My wife keeps reminding me that
backups ar not cheap and she has a point...)

: > First of all, I use "tar" to mean "a wad of files all jammed together,
: > possibly with gaps". Thinking tar == tar(1) is going to lead you into
: > details that I didn't mean. You caould use ar or gdbm or whatever,
: > all I care is that one I/O gets you the whole wad.
:
: OK, this leads me to think that you're focussed on getting all the
: blocks for files in a directory in a contiguous chunk. Is this
: correct?

Yes. Absolutely.

: > The idea of making one I/O work by putting the data in the directory
: > entries doesn't work when you limit the amount of data because there
: > is no nice natural size where you would draw the line.
:
: If you're referring to the proposed trick of putting data inside the
: inode for ext2fs, I agree.

Yes, that's a crock - it only works for symlinks (though it works really
well there, just put the link in the indirect block pointers which are
already there - it's free and it works about 99% of the time).

: > If resierfs could do stuff like "hmm, all the files in this
: > directory put together are < 5MB total, I'll tar 'em up", then
: > that's what I want.
:
: Why is this tarring step important? If instead it puts all the inode
: and data blocks for all the files in a 5 MByte contiguous chunk,
: what's wrong with that?
:
: Or do you see the "put inode & data in a chunk for all the files"
: operation as "tar 'em up"? If that's the case, then we've been
: violently agreeing but have confused each other with different
: meanings for the same terms.

I think we're agreeing... hard to imagine :-)

: > If resierfs was really, realy cool, it would have multiple tar files
: > per directory - one tar file per uid.
:
: How often do you get a mix of people writing to the same directory,
: other than /tmp?

Ahh, it's not necessary that they are writing to the same directory.
There are two directories but only one free block list (bitmap, whatever).
That's why the allocation poliy is so damn important. In fact, I'd go
so far as to say that the allocation policy of a file system *is* the
file system, everything else is 2 orders of magnitude less complex and
less important (which is the whole basis for my hatred of LFS - there
is no bloody allocation policy, extremely brain dead).

: > : halve the number of seeks, and will make the remaining seeks
: > : generally less expensive (no need to seek between two different
: > : areas on the disc)
: >
: > Bzzt. One I/O for the whole mess. Even if seeks were free, rotational
: > delays are not and I/O transactions are not. I want one I/O for the
: > whole directory.
:
: Yes, agreed, but I wanted to establish a point, that the number of
: seeks would be halved. I just wanted to check if you agreed with that
: before going on to the next point:

It's a meaningless point - it doesn't matter if we agree. It's like
agreeing that losing a leg isn't as bad as getting killed when we are
talking about a long life. Sure, losing a leg isn't as bad, but the goal
isn't to limp along, the goal is to acheive perfection (whee, count the
double meanings in there :-)

Going from N seeks to N/2 when you could go to 2 is not much of an improvement.

: Is your core point, then, that all data blocks (and secondarily, all
: inode blocks) for files in a directory are contiguous on the disc?
:
: If so, then that's good/frightening, because that's also what I
: advocate. I don't recall if reiserfs does this, but I hope it does.

Sort of. For this sort of thing to work, the file syste upper layers would
need to know about it. So that's a different file type, the "tar" or "ar"
or "clumped" file type. In that file type, there are two locations: the
inode and the data. The data contains the inodes and the data for all the
files in that "clump". The data may be sparse or have holes due to
realocations.

: > Remember, this has to work when you are untarring your crud and I am
: > untarring my crud and we are both allocating from the bitmap/free
: > list/ whatever at the same time. Waffle actually gets this right,
: > or so I was told by Hitz when I asked.
:
: You're mentioning untarring, but I presume you just mean that multiple
: users are somehow writing files into a FS at the same time?

Yeah, but the hard problem is not

user1$ dd of=XXX if=/dev/zero move=1g
user2$ dd of=XXX if=/dev/zero move=1g

that's pretty easy to get right because all you need are big enough
contig regions that the seeks get lost in the noise. In other words,
you need not worry about interleaving very much.

The hard problem is

gooch$ untar linux-2.2.99.tar
lm$ untar linux-2.2.103.tar

at the same time, allocating from the same file system. Waffle actually
seems to do the right thing, though I've never extracted details from Dave.

: I hope you at least agree with your SO ;-)

Well, I am married, so by definition all conversations with the SO start
out "Well, of course you are right ..." - that's the key to surviving
a marriage. You guys are the only people I get to flame anymore... Sigh.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/