Re: [RFC] atomic create+open

From: Miklos Szeredi
Date: Thu Oct 06 2005 - 15:19:12 EST


> >
> > When open_namei() gets around to following the mounts, it is not there
> > any more, so the dentry for /mnt/foo (the NFS one is returned) and
> > NFS's ->open is called on the file, which returns -ENOENT. But
> > open(..., O_CREAT, ...) should never return -ENOENT.
>
> ...and so the VFS can recognise the case, and be made to retry the
> operation.
> A more difficult race to deal with occurs if you allow a mount while
> inside d_revalidate(). In that case NFS can end up opening the wrong
> file.

Yes, in fact all my examples should have been for ->d_revalidate(),
since it's not possible that ->lookup() be called for a mounted
dentry.

> Both these two races could, however, be fixed by moving the
> __follow_mount() in open_namei() inside the section that is protected by
> the parent directory i_sem.

No. Only the namespace semaphore could protect against mount/umount,
but you don't want to take that in lookup logic.

> In any case, all you are doing here is showing that the situation w.r.t.
> mount races and lookup+create+open is difficult. I see nothing that
> convinces me that a special atomic create+open will help to resolve
> those races.

I just think that filesystem code should _never_ need to care about
mounts. If you want to do the lookup+open, you somehow will have to
deal with mounts, which is ugly.

> Nor do I see that adding a special atomic create+open will help me avoid
> intents for the case of atomic lookup+open(). As far as I'm concerned,
> the case of lookup+create+open is just a special case of lookup+open.

IMO the lookup+open optimization is not valid, because dealing with
mounts is the job of the VFS and not the filesystem.

Path lookup usually ends in a sequence of ->lookup (or
->d_revalidate), then following mounts, then doing the operation. You
shouldn't try to merge the ->lookup and the ->fsop into one operation.
That way leads to madness.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/