Re: multiply files in one

Richard Gooch (rgooch@atnf.csiro.au)
Tue, 30 Mar 1999 18:53:39 +1000


Larry McVoy writes:
> : What I was getting at
> : was that if you pack the data together, then read-ahead will yield
> : more complete files, which translates to less transactions in the end.
>
> Read ahead really doesn't solve the problem for two reasons:
>
> 1) the allocation policies in almost all file systems is file
> centric - it's careful to get *a* file contiguous but isn't
> careful to get multiple files in the same directory all next
> to each other

OK, let's assume (1), since we both agree that's required for good
performance.

> 2) even if (1) was solved, the file system needs to know that it
> bring in more data. If the file it is reading is 1K long,
> why should it brin in the next 5MB of data?

Ah, because the read-ahead size is not determine by the file size. At
its crudest, you hard-code 16 MBytes as the read-ahead size. Slightly
better, you do:
# mount -o rasize=16m /data
OR:
# mount -o rasize=1 /data
for a percentage of system RAM.

But the FS can be smarter, and check the size of all files in that
directory and if it falls under some threshold (#defined or supplied
via a mount option or whatever), it reads in all the data blocks for
that directory in one fell swoop.

Let me say that again: I always meant that you'd read-ahead some
MBytes of data, not a pathetic few blocks.

> : I don't care about saving disc space in terms of fitting all your data
> : onto the disc: disc is cheap. If you run out of room, buy some more.
>
> Amusing to see you say that after all of your arguments about not
> upgrading those 386's, but whatever, I agree.

Different circumstances. Sometimes we have to do the best we can with
limited hardware. Other times we want to improve things on big
systems.

> : > If resierfs was really, realy cool, it would have multiple tar files
> : > per directory - one tar file per uid.
> :
> : How often do you get a mix of people writing to the same directory,
> : other than /tmp?
>
> Ahh, it's not necessary that they are writing to the same directory.
> There are two directories but only one free block list (bitmap,
> whatever). That's why the allocation poliy is so damn important.
> In fact, I'd go so far as to say that the allocation policy of a
> file system *is* the file system, everything else is 2 orders of
> magnitude less complex and less important (which is the whole basis
> for my hatred of LFS - there is no bloody allocation policy,
> extremely brain dead).

I'm inclined to agree with that position. If you can get things right
at allocation time, you can get screaming performance at "run" time.

> : Is your core point, then, that all data blocks (and secondarily, all
> : inode blocks) for files in a directory are contiguous on the disc?
> :
> : If so, then that's good/frightening, because that's also what I
> : advocate. I don't recall if reiserfs does this, but I hope it does.
>
> Sort of. For this sort of thing to work, the file syste upper
> layers would need to know about it. So that's a different file
> type, the "tar" or "ar" or "clumped" file type. In that file type,
> there are two locations: the inode and the data. The data contains
> the inodes and the data for all the files in that "clump". The data
> may be sparse or have holes due to realocations.

Your clumped file type then is just a directory on a FS that ensures
all the inode and data blocks for the leaves of a directory are in a
single chunk on the disc. If what you mean is really that simple, then
say so. The references to "tar" and "ar" just confuse things for me.

> : > Remember, this has to work when you are untarring your crud and I am
> : > untarring my crud and we are both allocating from the bitmap/free
> : > list/ whatever at the same time. Waffle actually gets this right,
> : > or so I was told by Hitz when I asked.
> :
> : You're mentioning untarring, but I presume you just mean that multiple
> : users are somehow writing files into a FS at the same time?
>
> Yeah, but the hard problem is not
>
> user1$ dd of=XXX if=/dev/zero move=1g
> user2$ dd of=XXX if=/dev/zero move=1g
>
> that's pretty easy to get right because all you need are big enough
> contig regions that the seeks get lost in the noise. In other words,
> you need not worry about interleaving very much.

Agreed. I wasn't thinking about huge files. I was thinking about lots
of 9216-byte files.

> The hard problem is
>
> gooch$ untar linux-2.2.99.tar
> lm$ untar linux-2.2.103.tar
>
> at the same time, allocating from the same file system. Waffle
> actually seems to do the right thing, though I've never extracted
> details from Dave.

Agreed, that's hard. It all comes down to the allocation/realloction
policy of the FS. If the FS keeps all the inode and data blocks for
the files in a directory together, then the two users untarring the
kernel case you describe is easy to optimise.

You just need to get the FS to do that ;-) I should go back and check
the reiserfs docs about this.

Hans: any words on this?

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/