Re: make getdents/readdir POSIX compliant wrt mount-point dirent.d_ino
From: Jim Meyering
Date: Wed Nov 04 2009 - 14:29:15 EST
Ulrich Drepper wrote:
> On Tue, Sep 1, 2009 at 13:19, Theodore Tso<tytso@xxxxxxx> wrote:
>> Furthermore, there are
>> plenty of Unix systems that have received POSIX certifications despite
>> having this behavior.
> A common misunderstanding of certification.
> Like for all certifications, being POSIX certified doesn't mean the
> certification is valid for all situations. it only means that there
> is (at least) one configuration which meets the requirements. In this
> case it means the environment simply uses one single filesystem and no
> mount points. This way the problem doesn't even arise.
>> Fixing it is also going to be decidedly non-trivial since it depends
>> on how the directory was orignally accessed. [...]
> I guess that this is really a difficult way to solve. I wouldn't want
> to pay for something which is hardly ever really used.
> But there are programs out there which would like to use the inode
> uniqueness. Therefore the next best thing to do is perhaps to return
> a flag in the getdents information (in d_type, perhaps) to indicate
> that this is a mount point and/or that there are multiple ways to
> access the file in question. Then programs which can use the inode
> information can be watching for this flag and enter the slow path only
> if it's set.
Here is another reason to do what you suggest.
This bug report started it:
on the fly varying device numbers on a NFS mount point
More discussion here:
The problem is with hierarchy traversals again. The first time
a mount-point directory is encountered, fts opens it (with openat),
stats it and records dev,ino, and then reads entries. The first readdir
triggers the automount and thus, the assignment of a new device number
to the already-open directory. When the traversal process finishes
processing the hierarchy and traverses back "up" to that mount point,
it fails due to the old-st_dev/new-st_dev mismatch. Normally such
a mismatch indicates that someone is attempting to subvert a traversal,
or perhaps has inadvertently moved a subtree while it's being traversed.
In any case, once such a mismatch has been detected, there is no way
the traversal can safely continue.
One way to accommodate the current automount semantics, is to make fts.c
incur, _for every directory traversed_, the cost of an additional
stat (fstatat, actually) call just in case this happens to be one of
those rare mount points.
I would really rather not pessimize most[*] hierarchy-traversing
command-line tools by up to 17% (though usually far less) in order
to accommodate device-number change semantics that arise
for an automountable directory.
[*] At least the following GNU tools would be affected: find, chmod,
chown, chgrp, chcon, du, rm, and possibly soon, cp and ls.
 Note that if the mounted hierarchy is not too deep (I think it's
4 or 5 levels), cached "active-directory" file descriptors mask the
problem, because when we traverse back to the mount point, we still
have an open file descriptor for that directory. In that case, we don't
even need to perform the dev/inode comparison.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/