Re: Thinking outside the box on file systems

From: Craig Ruff
Date: Wed Aug 15 2007 - 14:49:27 EST


On Wed, Aug 15, 2007 at 10:30:19AM -0700, Marc Perkel wrote:
> --- Kyle Moffett <mrmacman_g4@xxxxxxx> wrote:
> > Except they do, and without directories the
> > performance of your average filesystem is going to suck.
>
> Actually you would get a speed improvement. You hash
> the full name and get the file number. You don't have
> to break up the name into sections except for
> evaluating name permissions.
>
> The important concept here is that files and name
> aren't stored by levels of directories. The name
> points to the file number. Directory levels are
> emulated based on name separation characters or any
> other algorithm that you want to use.
>
> One could create a file system and permission system
> that gets rid of the concept of directories entirely
> if one chooses to.

I would like to add support for Kyle's assertion.

The model described by Marc is exactly the method used by the current
version of the NCAR Mass Storage Service (MSS), which is data archive
of 4+ petabytes contained in 40+ million files. To the user's point
of view, it looks somewhat like a POSIX file system with both some
extensions and deficiencies. The MSS was designed in the mid-1980s,
in an era where the costs of the supercomputers (Cray-1s at that time)
were paramount. This lead to some MSS design decisions to minimize the
need for users to rerun jobs on the expensive supercomputer just because
they messed up their MSS file creation statements.

Files names are a maximum of 128 bytes, with a dynamically managed
directory structure indicated by '/' characters in the name. The file
name is hashed, and the hash table provides the internal file number (the
address in the Master File Directory (MFD)). Any parent directories
are created automatically by the system upon file creation, and are
automatically deleted if empty upon file deletion. Directories also
have a self pointer, and both files and directories are chained together
to allow the user to list (or otherwise manipulate) the contents of
a directory.

The biggest problem with this model is that to manipulate the a directory
itself, you have to simulate the operation on all of the files contained
within it. For example to rename a directory with 'n' descendants,
you must perform:

n+1 hash table removals
n+1 hash table insertions (with collision detection)
n+1 MFD record updates
1 directory chain removal
1 directory chain insertion

This is, needless to say, very painful when n is large. Since users
must use directory trees to efficiently manage their data holdings,
efficient directory manipulation is essential. Contrast this with
the number of operations required for a directory rename if files
do not record their complete pathname:

1 directory chain removal
1 directory chain insertion

Fortunately we are currently working to change from using a model like
Marc describes to one Kyle describes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/