[rename() changes] Re: VFS postings

Alexander Viro (viro@math.psu.edu)
Mon, 8 Feb 1999 21:18:21 -0500 (EST)


On 9 Feb 1999, Magnus Ahltorp wrote:

> By a coincident, I saw a discussion about a change in the VFS
> interface. Why is this discussed on linux-kernel? Why not the fsdevel
> list?
Ooops. Sorry.
> Please, please, please, don't make life harder than it is for an fs
> developer. Is it some kind of sport keeping everything on one mailing
> list?
Magnus, I was going to mail you this evening, anyway, but yes,
fsdevel posting is in order.

> Could someone at least announce vfs changes on the fsdevel list?

Doing that. Here we come:

a) rename serialization is done on VFS level (on per-filesystem basis).
It's doen only for directory renames - other don't need serialization at
all. Impact on filesystems: no changes required. If you do your own
serialization you may drop it now or at some later point - VFS will take
care of that.

b) Two POSIX checks are done by VFS now. First of all, POSIX requires that
rename() should do nothing and silently return 0 if target and source are
links to the same file. Yes, it's braindead. So is large part of POSIX.
Second check is related to moving a directory under its descendant. POSIX
requires returning -EINVAL and doing nothing. Also done in VFS.
Impact on filesystems: no changes required. You may want to remove your
own checks - they might become redundant. Note: 'subdirectory' check is
done on dcache level, so if you have several dentries for the same
directory you still have to keep your own checks around (example: VFAT and
its handling of aliases).

c) For directory renames there is a subtle race in emptiness checking.
*All* filesystems in official tree that support rename() suffered from it.
Details: even if d_count of the target is 1 we can't safely do emptiness
check - anybody can reach the target via cached lookup (so semaphores on
parents do not help) and start to create something there. Race is between
the emptiness check and link creation. We could hold the semaphore on
target and avoid that problem, but then we'ld have to deal with all sorts
of priority inversions (rename already grabs two semaphores and there is
no way to determine whether we need the third one in advance). Soultion:
for directory renaming VFS tries to unhash the target (if it fails we
should return -EBUSY), calls the ->rename() method and when it finishes
VFS rehashes the target and does d_move() if ->rename() returned 0.
Impact on filesystems:
1. In case of directory rename the target dentry is unhashed.
2. ->rename() MUST leave doing d_move() to the caller.
3. ->rename() MAY act in assumption that if it is asked to
rename a directory it is (and will remain) the sole owner of target.

Unrelated bug in rename (not fixed by those changes):
if you are doing rename over the existing target and said target is
non-directory DON'T call unlink() on it. rename() semantics requires that
there should be no moment when lookup on target would return negative.
Several filesystems used to do it, so you may want to look at your code.
It's not just POSIX - BSD and SVID also require it.

So there... It fixes a nasty race (fs corruption *is* doable) and
moves generic tests to VFS, where they belong. Hopefully it will make fs
drivers less dependant on details of VFS and vice versa. IMHO it's a
Good Thing (tm). At least in case of filesystems in the official tree it
reduced the ->rename() complexity big way.
HAND,
Al

-- 
  Think of the linux kernel community as a large cluster of hairy SMP systems.
Changing anything in the way people write code requires flushing a huge number
of L2 caches inside people's heads        -- Michael Elizabeth Chastain on l-k

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/