> On Sat, 6 Feb 1999, Alexander Viro wrote:
> > D'oh. Now, if it would be the only bad thing about rename()...
> > In essence, we have it serialized by a per-fs lock. It's done in each
> > fs separately. And it prevents us from moving other generic tests to VFS,
> > where they belong.
> Yeah, that would be good. "rename()" is a real pain. It's _the_ most
> complex operation, mainly because of renaming subdirectories has such a
> non-local effect.
> > I'ld be more than happy to have this puppet moved to
> > vfs_rename(), along with is_subdir() stuff and check for source and target
> > having the same inode. Then foo_rename()s would start to look sane. But it
> > requires adding new fields to struct super (lock and wait queue) . If
> > you'll give clearance I'll do it ASAP.
> I'll give you clearance, not for 2.2.2, but you wouldn't get it done by
> then anyway, so 2.2.3 would be your target. HOWEVER, I'd ask you to
> actually go one step further: the VFS layer should separate the case of
> renaming a subdirectory from the case of renaming a regular file.
> In the end, both cases would call the same fs-specific rename() function,
> but basically something like this would be what I'd want to see to make
> each fs not have to worry about locking at all, the way it should be:
Urm... I've already started coding and it fits nicely into vfs_rename().
It's just a matter of single 'if (isdir)' in vfs_rename(), right before
the call of ->rename().
> - a non-directory "rename()" would _not_ get the per-fs rename lock. They
> don't need them, the per-fs lock is only used to protect against
> concurrent renames that change the topology (so that you can sanely
> avoid loops in the directory structure)
Already done that way.
> - the directory case is different in other ways too: it needs to do the
> is_subdir() check (which should also be done on a VFS layer, as it
> these days is just a matter of following the dentry pointers), and it
> needs to look up the target dentry with the LOOKUP_SLASHOK flag.
> - the directory case would unhash the target directory in the VFS layer
> as per your race fix, and would re-hash it if the low-level rename
> returns an error.
> - the VFS layer would do the "d_move()" operation (both for directory and
> non-directories), the low-level FS would only do the actual low-level
Fine. That's what I was going to do.
> If we have to change rename (and all low-level filesystems) to fix the
> race, let's just fix it once and for all, and separate the two rename
> cases properly.
Could you elaborate? I'ld see the point if we would split the *method*,
but I don't see the reason for splitting the VFS-level code. Method has to
distingush between directory/non-directory move anyway - '..' flipping and
emptiness check are fs-specific. Ditto for nlink overflow checking (BTW,
it asks for additional field in struct super, IMHO. Say it, s_max_nlink).
vfs_rename() already checks for directory/non-directory and the checks in
question fit nicely into it. Did I understand you right and you want to
split it into two separate functions? It's doable, indeed, but I don't see
the reason. Anyway, it's a matter of cut'n'paste, adding a new entry to
export list and putting a trivial if() into nfsd/vfs.c, so if you really
want it - no problem.
BTW, we'll have to redo it in 2.3 - serialization is way too rough right
now and we can easily make if finer if we'll use d_flags to mark source of
rename() in progress (if there are no marked points on the path from
target to the closest common ancestor of target and source we can go ahead
without waiting for other renames). We need an unroll mechanism for
lookups to deal with unlink/create race and lookup/rename races, anyway,
so IMHO it's a 2.3 stuff.
ObRaces: assume that /foo/bar/a and /foo/bar/b are both symlinks to "..".
Now, unlink("/foo/bar/a/bar/b") and unlink("/foo/bar/b/bar/a") shouldn't
*both* succeed, right? If one of them succeeds another should give ENOENT.
Right now there is a race allowing both of them finish their lookup phases
and happily do their things. Another example: suppose we have
/foo, /foo/a, /bar, /bar/b and we are renaming /foo/a to /bar/a. Now,
/foo/a/../b is ENOENT before rename() (no /foo/b) and is ENOENT after
rename() (no /foo/a). With the current code it may succeed giving /bar/b.
What we need is a mechanism that would do right ordering/unrolls of
lookups. I'm going to submit it for 2.3 - hope I'll debug it till then.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
Please read the FAQ at http://www.tux.org/lkml/