lock_parent() race? [was: Re: 2.1.50/2.1.51: that "oops" thing]

Ingo Molnar (mingo@pc7537.hil.siemens.at)
Wed, 20 Aug 1997 10:14:39 +0200 (MET DST)


On 20 Aug 1997, Linus Torvalds wrote:

[...]
> Can you run the above through ksymoops, I'd really like to see where we
> got to this from.

maybe unrelated, but i think lock_parent() is unsafe in it's current form:

static inline struct inode *lock_parent(struct dentry *dentry)
{
struct inode *dir = dentry->d_parent->d_inode;

down(&dir->i_sem);
return dir;
}

now, if we use lock_parent() in say sys_rmdir(), we do this because we
want to remove an ext2fs directory entry. The entry was looked up as
'dentry', the directory currently holding this entry is 'dir'.

Now, what happens if down(&dir->i_sem); sleeps. We have _no_ guarantees
what happens to the dentry tree. Dentries can be moved around at will it
seems, the only guaranteed thing is, that 'dentry' wont be _removed_. But
otherwise, we carry our assumption across a blocking point: the pointer
'dentry->d_parent'. This pointer is not protected, unless i'm terribly
offtrack.

one typical way to crash: directory gets moved from a to b, and then
removed again, both in the old place and in the new place, all in
parallel. This race window seems to be wide for all filesystems that might
introduce sudden delays to parent locking (autofs, cdroms).

now, in ext2fs, the above sys_rmdir() scenario is IMO hidden by a
bug/hidden feature/overlapping funcionality in ext2_rmdir():

retval = -EIO;
if (inode->i_dev != dir->i_dev)
goto end_rmdir;

why do we return with -EIO here? Both the entry and the directory is
kernel-provided, no physical disk error should cause 'i_dev != i_dev', i
think.

the more i think about lock_parent(), the uglier it gets. Our dentry-tree
is currently lock-less, with some garbage collection scheme for lockless
remove, but otherwise we cannot assume anyhting about dentry _links_ when
we sleep? am i missing something?

shouldnt sys_rmdir() do the following thing rather, to resolve the race:
look up the dentry, take the parent, store the last _name component_, lock
the parent, look up the name component and delete if present, -ENOENT if
not present. I cant see any other safe way.

-- mingo