Re: Is any file system on Linux appropriate for very large directories?

Matthias Urlichs (smurf@smurf.noris.de)
Thu, 25 Jul 1996 17:00:12 +0100


In linux.dev.kernel, article <199607241957.MAA06475@muddcs.cs.hmc.edu>,
"Michael J. Micek" <mmicek@muddcs.cs.hmc.edu> writes:
> =20
> Copious information about skiplists is to be found at
>=20
> ftp://ftp.cs.umd.edu/pub/skipLists
>=20
There's a big problem with skip lists -- they distribute all their data=
all
over the place. B*-trees have much better locality.

Using a skip list, you have to follow the chain of entries on each leve=
l to
find out whether you need to continue, or whether you can proceed to th=
e
next level. Each follow-the-link, including the last (where you read da=
ta
you can't really use), is one disk read because the entries are rather
unlikely to be colocated.

With a B* tree, you need to do exactly one read operation per level, an=
d
the higher levels are likely to be in cache anyway.

My conclusion, therefore, is that a skip list is a rather stupid idea w=
hen
you try to apply it to the directory of an on-disk file system.

And, while I agree with Linus that you should probably use a multilevel
scheme, sometimes things do go wrong. A cron job that doesn't clean up
after itself will leave 30000 temporary files behind when the system is=
up
a few weeks. The mailer stores its temporary files in a flat directory.=
The
news system stores all articles in one directory (that one should chang=
e in
the near future, but it's that way now). Anyway, mistakes happen. It's =
very
expensive in CPU time to clean up a directory that's grown to beyond 50=
k or
so, "ls" is _slow_ even if it's not sorting, and these beasts don't eve=
n
shrink their size back once you have cleaned them...

All these problems could be avoided with things like B*-tree directorie=
s.
Creating an alternate version of ext2, which does everything just like =
the
"real" ext2 file system but which uses B*-trees instead of linear lists=
for
its directory storage (or even, in addition to -- for new directories, =
if
you tell it to with a mount option), shouldn't be _too_ difficult to cr=
eate
-- I'd be very interested how much the performance difference on our N=
ews
server would be...

--=20
Write home; they miss you.
--=20
Matthias Urlichs \ noris network GmbH / Xlink-POP N=FCrnberg=
=20
Schleiermacherstra=DFe 12 \ Linux+Internet / EMail: urlichs@nor=
is.de
90491 N=FCrnberg (Germany) \ Consulting+Programming+Networking+etc=
'ing
PGP: 1024/4F578875 1B 89 E2 1C 43 EA 80 44 15 D2 29 CF C6 C7 E0 D=
E
Click <A HREF=3D"http://info.noris.de/~smurf/finger">here</A>. =
42