Re: patch for 2.1.56 fs/readdir

Linus Torvalds (torvalds@transmeta.com)
Sat, 20 Sep 1997 10:57:39 -0700 (PDT)


On Sat, 20 Sep 1997, Bill Hawes wrote:
>
> In reviewing the ext2 readdir routine I noticed a lot of heuristic
> checks to try to avoid changes in directory blocks while a directory
> scan is in progress. As the new VFS now holds the parent directory
> semaphore for operations that change directories, we can use this to
> protect i_op->readdir against changes.
>
> The attached patch adds semaphore locking to fs/readdir.c, and if this
> checks out OK we should be able to remove a lot of time-consuming checks
> from the ext2 readdir code. I'm running it here with no problems.

Note that a lot of the problems with "readdir()" are only partly due to
changing the directory while we read it (although I agree with this patch
and will be applying it). Some of the checks are due to another issue:
re-validating the position in the directory because somebody might have
done a 'lseek()' on it.

But I agree: with the directory locked, we need to validate our position
only at the beginning of the readdir(), not inside the loop.

However, this _does_ bring up another issue that I have been wondering
about: would it be better to make the inode semaphore not a semaphore but
a read-write lock? We don't have any blocking read-write locks, but they
shouldn't be hard to implement similarly to the rw-spinlocks.

Using a rw-lock instead of a semaphore would allow us to do sane locking
for reads too. Right now we do _no_ locking for reads, because we want to
allow as much parallellism as possible, and reads can usually be handled
reasonably simply by optimistic checking rather than locking. But a
rw-lock would simplify some things.

(In particular I'd like to have a rw-lock for lookups, but right now
lookups actually need the exclusivity of the write-lock to single-thread
the access to the incomplete dentry we create - I haven't come up with a
good solution to this, but I dislike the necessity)

rw-locking would make some things so much cleaner (right now a reader can
see a partial write if it happens to get in at the right moment and with
the right page fault behaviour - it isn't really a problem for any sane
usage, but it does have some strange implications)

Linus