Re: [rfc] Near-constant time directory index for Ext2

From: Daniel Phillips (
Date: Wed Feb 21 2001 - 18:42:42 EST

"H. Peter Anvin" wrote:
> Martin Mares wrote:
> >
> > > True. Note too, though, that on a filesystem (which we are, after all,
> > > talking about), if you assume a large linear space you have to create a
> > > file, which means you need to multiply the cost of all random-access
> > > operations with O(log n).
> >
> > One could avoid this, but it would mean designing the whole filesystem in a
> > completely different way -- merge all directories to a single gigantic
> > hash table and use (directory ID,file name) as a key, but we were originally
> > talking about extending ext2, so such massive changes are out of question
> > and your log n access argument is right.
> It would still be tricky since you have to have actual files in the
> filesystem as well.

Have you looked at the structure and algorithms I'm using? I would not
call this a hash table, nor is it a btree. It's a 'hash-keyed
uniform-depth tree'. It never needs to be rehashed (though it might be
worthwhile compacting it at some point). It also never needs to be
rebalanced - it's only two levels deep for up to 50 million files.

This thing deserves a name of its own. I call it an 'htree'. The
performance should speak for itself - 150 usec/create across 90,000
files and still a few optmizations to go.

Random access runs at similar speeds too, it's not just taking advantage
of a long sequence of insertions into the same directory.

BTW, the discussion in this thread has been very interesting, it just
isn't entirely relevant to my patch :-)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Fri Feb 23 2001 - 21:00:26 EST