Re: multiply files in one (was GNU/Linux stance by Richard Stallman)

Richard Gooch (rgooch@atnf.csiro.au)
Tue, 30 Mar 1999 12:51:28 +1000


Brandon S. Allbery writes:
> In message <199903300053.KAA01302@vindaloo.atnf.CSIRO.AU>, Richard Gooch writes
> :
> +-----
> | Larry McVoy writes:
> | > Yes, that's the point. Forget everything else and concentrate on
> | > designs where this point is true and you'll be rockin and rollin into
> | > the performance winner's circle.
> |
> | OK, we agree on that. But I'm under the impression that you think that
> | reading dozens of megabytes ahead *won't* give you much performance
> | improvement. Is that the case or not?
> +--->8
>
> It depends on which megabytes, no? A user-space "partitioned
> dataset" would most probably be arranged to be convenient for the
> program(s) that used it; accomplishing the same with disk blocks
> arranged to produce useful readahead is problematic, especially if
> the files change (extended or truncated and extended) often.

That blows a tarfs out of the water. Extending files in a tarfile is
very painful.

> Also, if you're depending on the system-wide buffer cache for this,
> you're potentially out of luck if you run multiple programs which
> each want different sets of files.

Same with tarfs. You can end up with "competing" tarfiles.

> BTW, the Ultrix print server we're finally retiring uses this trick
> for PostScript fragments used to initialize printers, print banner
> pages, etc., only with ar archives instead of tar.

Ah. User space solution ;-)

But we're talking about kernel tricks, and here I don't see the
advantage of reading in a tarfile on-the-fly (and of course read-ahead
is required to make this work) compared to just reading ahead lots of
blocks.

The advantage of tarfs is that for each file you don't have to seek to
the inode and seek to the data blocks. So there's a 2x speedup. You
can do that with ext2fs by reading ahead the inode area. Tarfs still
suffers from the problem of having to seek around in the data block
area if you're accessing files scattered across the FS. Creating a
tarfs for every directory would make things worse, I think.

To get a 10x speedup with tarfs that Larry talks about requires lots
of read-ahead. It seems to me the win is from massive read-ahead, and
not from the kind of filesystem. I think we're better off with
reiserfs than tarfs. Couple that to massive read-ahead and we should
see that 10x speedup.

I still don't see where the great advantage of tarfs over ext2fs is,
other than that 2x speedup which is easy enough to get by reading
ahead the inode blocks. And IIRC, reiserfs already gives you that 2x
speedup because of the way it packs things.

Once you get down to one seek per file (assuming no massive
read-ahead), all solutions to achieve << 1 seek per file are built on
massive read-ahead in one form or another. In a sense, tarfs (or any
other scheme) is just a facade on top of this essential foundation.

Again, if I'm wrong, please show me exactly how tarfs is intrinsically
better at getting << 1 seek per file.

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/