Re: [reiserfs-dev] Re: Ext2 directory index: ALS paper and benchmarks

From: Daniel Phillips (phillips@bonn-fries.net)
Date: Sat Dec 08 2001 - 21:39:38 EST


On December 8, 2001 08:16 pm, Ragnar Kjørstad wrote:
> On Sat, Dec 08, 2001 at 03:15:50AM +0300, Hans Reiser wrote:
> > >>no, actually this is a problem for v3. stat data are time of creation
> > >>ordered (very roughly speaking)
> > >>and directory entries are hash ordered, meaning that ls -l suffers a
> > >>major performance penalty.
> > >
> > >Yes, just remember that file-body ordering also has the same problem.
> > >(ref the "find . -type f | xargs cat > /dev/null" test wich I think
> > >represent maildir performance pretty closely)
> >
> > So is this a deeply inherent drawback of offering readdir name orders
> > that differ hugely from time of creation order?
>
> It should not be, if:
> * If the cache was smart enough to detect that the user is reading all
> the files in a directory and started reading in files into memory
> ahead of time. It sounds ugly, so I don't know if I like it.
> or
> * file-bodies were ordered by hash as well.
>
> > I suppose we could use objectids based on the hash of the first
> > assigned filename plus a 60 bit global to the FS counter....
> >
> > but it is too many bits I think. I think that using substantially less
> > than the full hash of the name that is used for directory entry keys
> > doesn't work.... Comments welcome.
>
> The abould stort the file-bodies in optimal order in the three, but
> block-allocation is a seperate issue, right? Even if block-allocation
> would take objectids into account it would be nearly impossible to get
> the optimal order of the file-bodies, because you don't know the number
> of files and their sizes before allocating the blocks for the first
> files. (unless you would move files around after they are allocated)
>
> So, I think the _only_ way to get the optimal performance for a growing
> directory is to do allocation and ordering by creating-time.
>
> That said, maybe trying to find algorithms that are order sub-sets of
> the directories entries in optimal order rather than the whole directory
> is perhaps more constructive? Or look at repackers or other utilities to
> reorder data in batch instead?
>
> So how is this done in ext2/3 with directory indexes, Daniel? Are there
> any papers available, or would I have to decifer the source?

You should find this useful:

   http://people.nl.linux.org/~phillips/htree/paper/htree.html
   http://people.nl.linux.org/~phillips/htree/htree.ps.gz

The coherency between inode order and file body order is handled for me
in the existing Ext2 code base. I haven't really dug into that algorithm but
it seems to produce servicable results. Note: Al Viro has taken a look at
improving that code. It's an ongoing project that's been discussed on lkml
and ext2-devel.

As far as coherency between readdir order and inode order goes, I'v just
left that dangling for the moment. This doesn't hurt until we get over a
million files/directory, and then it doesn't hurt an awful lot. As I
mentioned earlier, I think the increased table thrashing exhibited over the
million file mark is more because of shortcomings in icache handling than
anything else.

In the long run I plan to do some work on inode allocation targets to improve
the correspondence between readdir order and inode order.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:14 EST