> > The change should dramatically improve path lookups (the inode for the
> > next directory segment is inline with the name, so saves you a seek
> > per segment), random open()'s (saves you a seek per open), find(1)'s
> > all over, etc etc.
>
> If you have plenty of memory the odds of the directory not being in
> memory when you fetch the inodes are pretty low IMHO, especially as the
> directory will be readahead.
Even in this, there's still a win from not needing to allocate a fixed
amount of inodes.
In practise, on large server, it's rare to get a very high level of
cache hits (3 million file filesystem would need 384K of ram just to
hold the inode tables in the best case, ignoring all the directories,
the other meta-data, and the on-going disk activity).
> > Comments?? (people dying to implement such a beast? :)
>
> Are you sure its not in fact the time taken to walk directories that is
> doing the damage, especially if they are big directories. If so then a
> btree directory format makes more sense.
My example case has less than 100 entries per directory. (LOTS of
directories tho).
Michael.