Re: [uml-devel] Re: [PATCH 12/12] HPPFS: fix nameidata handling

From: Al Viro
Date: Wed Sep 21 2005 - 12:23:47 EST


On Wed, Sep 21, 2005 at 06:34:11PM +0200, Blaisorblade wrote:
> > And this is absolutely bogus. The whole point of ->follow_link() is to
> > move where your nameidata points to. So no, you do _not_ want to flip
> > nameidata, you do not want to drop it and you certainly do not want
> > to flip it back. Just call the underlying one.

> But wouldn't the callee expect that nd->dentry is the same thing which I pass?

No.

> Yes, nameidata has other fields too, but is ->dentry meant to be unused here?
> Not surely.

Yes, it is used. As in "here's where we stand in the tree, move as
specified by your symlink (or leave the pathname to be traversed by caller)".

> And especially, how would proc_pid_follow_link()'s path_release() call handle
> that?

It would drop the reference to previous location and slap relevant
vsmount and dentry instead - which is exactly what we want from
e.g. following /proc/<pid>/root.

->follow_link() gets your current location (in nameidata) and its job
is to shift it according to your symlink. Which will end up with
original references being released, ones to new location grabbed and
left in nameidata.

Which is exactly what proc_pid_follow_link() is doing. The whole point
of ->follow_link() is its effect on nd->{mnt,dentry}.

One optimization comes from the fact that
* 99% of symlinks end up with "get a pathname out of filesystem,
traverse that pathname"
* we are in the middle of mutual recursion and sensitive to call
chain depth.

So while ->follow_link() _can_ do it itself (call vfs_follow_link(nd, string),
which will move nameidata according to pathname we pass to it), it can just
leave the pathname to caller. If your ->follow_link() had done
nd_set_link(nd, string)
the caller will do the rest - call vfs_follow_link() *after* we'd returned
from ->follow_link(). Then it will call your ->put_link() (if present)
to undo whatever you've done to pin the string down.

See __do_follow_link() in fs/namei.c - that's what calls these suckers.
The reason for allowing to leave pathname traversal to caller is simple -
then our ->follow_link() is not in the mutual recursion anymore and we
win stack space.

Examples:
1) /proc/self:
static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
{
char tmp[30];
sprintf(tmp, "%d", current->tgid);
return ERR_PTR(vfs_follow_link(nd,tmp));
}
calls vfs_follow_link() itself, nothing left to do in caller.

2) ext2 fast symlinks:
static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct ext2_inode_info *ei = EXT2_I(dentry->d_inode);
nd_set_link(nd, (char *)ei->i_data);
return NULL;
}
symlink body is embedded into in-core inode, just tell the caller to
traverse it, no cleanup needed since inode will stay pinned down
all that.

3) ext2 normal symlinks (and the same thing is used by a lot of other
filesystems; that's the ones that have symlink look like a file with
->readpage() bringing its contents into page cache):
void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
{
struct page *page = NULL;
nd_set_link(nd, page_getlink(dentry, &page));
return page;
}
symlink body is in page cache; read it, pin it down and tell the caller to
traverse it. Cleanup is done by
void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
{
struct page *page = cookie;
if (page) {
kunmap(page);
page_cache_release(page);
}
}

4) normal symlinks in /proc (such as e.g. /proc/ide/hda):
static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
{
nd_set_link(nd, PDE(dentry->d_inode)->data);
return NULL;
}
same story as with ext2 fast symlinks, no cleanup needed.

5) /proc/<pid>/cwd and its ilk:
static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct inode *inode = dentry->d_inode;
int error = -EACCES;

/* We don't need a base pointer in the /proc filesystem */
path_release(nd);

if (current->fsuid != inode->i_uid && !capable(CAP_DAC_OVERRIDE))
goto out;
error = proc_check_root(inode);
if (error)
goto out;

error = PROC_I(inode)->op.proc_get_link(inode, &nd->dentry, &nd->mnt);
nd->last_type = LAST_BIND;
out:
return ERR_PTR(error);
}
this is not a traversal of some pathname, so we do everything ourselves -
release the original location, find where to go and slap new dentry/vfsmount
into nd->dentry/nd->mnt.


What HPPFS should be doing:
1) have your ->follow_link() call ->follow_link() of underlying
procfs inode and just return whatever it had returned.
2) have ->put_link() that would call ->put_link() of underlying
procfs inode, passing it underlying dentry, nameidata it had been given
and pointer it got as the last argument. If underlying inode has no
->put_link(), do nothing.
3) if you want to override a symlink, refer to examples above.
At a guess, it will be along the lines of (2) and (4).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/