Re: Ext2fs and hashed table.

Paul H. Hargrove (hargrove@sccm.Stanford.EDU)
Mon, 2 Jun 1997 15:34:47 -0700 (PDT)


Matthias Urlichs writes:
> R.E.Wolff@BitWizard.nl (Rogier Wolff) writes:
[snip]
> > find / -name blabla\*
> >
> > is an operation that you can do REALLY FAST on an HFS filesystem.
> >
> Well, you still have to scan through the entire B-tree because the index is
> <directoryID,filename>. The speedup comes more from being able to do all
> this with one system call (not "one call per directory", but "one call per
> file you end up finding"), which lets the OS and the disk do some readahead.
[snip]

The order of the btree nodes on disk has no relation to their order in
the catalog tree structure, so there is probably no readahead
advantage at all. However, a carefully constructed search could be
kinder to the buffer cache by telling it not to cache blocks after the
search is done with them (since it wouldn't return to the same block).

You do need to search every directory independently, since the dirID
is the primamary sort field. However, the fact the the secondary sort
is on the filename makes possible a system call which would do a
binary search to find all the entries in the a given directory with
names in a given range. (name-prefix search is just a special case of
range search.)

A system call which did the binary search on each directory in dirID
order would reduce some traversal overhead, but it wouldn't be able to
return pathnames without visiting the directories in the order given
by the directory hierarchy. So you might as well let the user space
tool to the directory traversal and go with "one call per directory".
This way would also allow for effeciently searching a subtree.

-- 
Paul H. Hargrove                   All material not otherwise attributed
hargrove@sccm.stanford.edu         is the opinion of the author or a typo.