UBIFS seekdir()/telldir() issue

From: Artem Bityutskiy
Date: Mon May 26 2008 - 09:57:21 EST


The problem
~~~~~~~~~~~

telldir() returns an opaque 32-bit value which should be enough
to seek to the position later and continue readdir()-ing.

UBIFS stores direntries in a B-tree with 64-bit keys like:
| Parent Ino (32-bit) | key type (3 bits) | name hash (29 bits) |

The name hash may have collisions. Obviously, this does not
quite fit to the telldir()/seekdir() model.

Note, this is very similar to reiserfs and reiser4 approach.

Current UBIFS implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At the moment UBIFS stores the hash value in file->f_pos, so
telldir() basically returns direntry name hash values. And this
does not quite work in case of hash collisions - seekdir() may
not always seek correctly, which is rare though.

However, consecutive readdir() calls work fine because we store
full name in file->private_data which helps to deal with
collisions.

Naturally, readdir() returns direntries in hash order.

It seems there are not much applications which depend on telldir()
and seekdir(), so this does not look too bad, at least in embedded
world. But this means NFS will not really work because it needs
"absolute" seekdir() implementation.

Possible approach
~~~~~~~~~~~~~~~~~

Short investigation showed that other file-systems have separate
trees which are needed just for the sake of seekdir(). Well, it is
sad that FSes have to do this just because of not very good
interface, but such is live.

In UBIFS we are very reluctant to maintain any additional on-flash
tree which would emulate traditional Unix "directory is an array
of direntries" view. It is just too much overhead for embedded
systems. And it would add more complexity to an already complex
file-system.

Instead, we would like to do the following.

1. Assign a 32-bit unique id for each direntry. Each directory
inode has its own space of the ids.
2. readdir() stores the id of the next direntry to return in
file->f_pos, which means telldir() will return the unique ids,
not hashes as it is currently done.
3. However, readdir() still walks direntries in hash, and
it stores the key and full name of the next entry in
file->private_data. This makes consecutive readdir() calls
work correctly and efficiently.
4. When seeking, UBIFS havs to scan the main B-tree and
look up direntry with the required ID. This is not efficient.

This approach gives full seekdir() support, except of one case
described below.

Problems with this approach would be:
1. Slow seeks. But we think that actually rare applications do
seekdir(), at least in embedded world. Comments?

2. If the direntry we are trying to seek to was _deleted_, we'll be
unable to gracefully start from the closest next one. This is simply
because readdir() returns entries in hash order, not in id order.
The best UBIFS would be able to do in this case is to seek to the
beginning of the directory.

IOW, sequence like this would be problematic:

dir = opendir();
pos = telldir(dir);
dent = readdri(dir);
unlink(dent->d_name);
closedir(dir);
dir = opendir();
seekdir(pos); // Here we do not know where to seek

But we think this is also quite rare use-case and it should not
hurt to seek to the beginning in these cases. Comments?

Thanks

--
Best Regards,
Artem Bityutskiy (ÐÑÑÑÐ ÐÐÑÑÑÐÐÐ)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/