Re: [RFC PATCH 3/8] kmod - teach call_usermodehelper() to use a namespace

From: Jeff Layton
Date: Sun Feb 08 2015 - 10:22:19 EST


On Sun, 08 Feb 2015 11:07:32 +0800
Ian Kent <ikent@xxxxxxxxxx> wrote:

> On Fri, 2015-02-06 at 07:08 -0500, Jeff Layton wrote:
> > On Thu, 05 Feb 2015 10:34:11 +0800
> > Ian Kent <ikent@xxxxxxxxxx> wrote:
> >
> > > The call_usermodehelper() function executes all binaries in the
> > > global "init" root context. This doesn't allow a binary to be run
> > > within a namespace (eg. the namespace of a container).
> > >
> > > Both containerized NFS client and NFS server need the ability to
> > > execute a binary in a container's context. To do this use the init
> > > process of the callers environment is used to setup the namespaces
> > > in the same way the root init process is used otherwise.
> > >
> > > Signed-off-by: Ian Kent <ikent@xxxxxxxxxx>
> > > Cc: Benjamin Coddington <bcodding@xxxxxxxxxx>
> > > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> > > Cc: J. Bruce Fields <bfields@xxxxxxxxxxxx>
> > > Cc: David Howells <dhowells@xxxxxxxxxx>
> > > Cc: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > Cc: Oleg Nesterov <onestero@xxxxxxxxxx>
> > > Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> > > Cc: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
> > > ---
> > > include/linux/kmod.h | 16 +++++++
> > > kernel/kmod.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > 2 files changed, 128 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> > > index 15bdeed..b0f1b3c 100644
> > > --- a/include/linux/kmod.h
> > > +++ b/include/linux/kmod.h
> > > @@ -52,6 +52,7 @@ struct file;
> > > #define UMH_WAIT_EXEC 1 /* wait for the exec, but not the process */
> > > #define UMH_WAIT_PROC 2 /* wait for the process to complete */
> > > #define UMH_KILLABLE 4 /* wait for EXEC/PROC killable */
> > > +#define UMH_USE_NS 8 /* exec using caller's init namespace */
> > >
> > > struct subprocess_info {
> > > struct work_struct work;
> > > @@ -69,6 +70,21 @@ struct subprocess_info {
> > > extern int
> > > call_usermodehelper(char *path, char **argv, char **envp, int flags);
> > >
> > > +#if !defined(CONFIG_PROC_FS) || !defined(CONFIG_NAMESPACES)
> > > +inline struct task_struct *umh_get_init_task(void)
> > > +{
> > > + return ERR_PTR(-ENOTSUP);
> > > +}
> > > +
> > > +inline int umh_enter_ns(struct task_struct *tsk, struct cred *new)
> > > +{
> > > + return -ENOTSUP;
> > > +}
> > > +#else
> > > +struct task_struct *umh_get_init_pid(void);
> > > +int umh_enter_ns(struct task_struct *tsk, struct cred *new);
> > > +#endif
> > > +
> > > extern struct subprocess_info *
> > > call_usermodehelper_setup(char *path, char **argv, char **envp, gfp_t gfp_mask,
> > > int (*init)(struct subprocess_info *info, struct cred *new),
> > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > index 14c0188..4c649d6 100644
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -582,6 +582,98 @@ unlock:
> > > }
> > > EXPORT_SYMBOL(call_usermodehelper_exec);
> > >
> > > +#if defined(CONFIG_PROC_FS) && defined(CONFIG_NAMESPACES)
> > > +#define NS_PATH_MAX 35
> > > +#define NS_PATH_FMT "%lu/ns/%s"
> > > +
> > > +/* Note namespace name order is significant */
> > > +static const char *ns_names[] = { "user", "ipc", "uts", "net", "pid", "mnt", NULL };
> > > +
> > > +struct task_struct *umh_get_init_pid(void)
> >
> > nit: we're not getting a pid here but a task_struct pointer. Maybe this
> > should be called umh_get_init_task?
>
> Ha, yep.
>
> >
> > > +{
> > > + struct task_struct *tsk;
> > > +
> > > + rcu_read_lock();
> > > + tsk = find_task_by_vpid(1);
> > > + if (tsk)
> > > + get_task_struct(tsk);
> > > + rcu_read_unlock();
> >
> > I'm not terribly familiar with the task_struct lifetime rules...
> >
> > I assume that you can be assured that tsk won't go away while you hold
> > the rcu_read_lock, but is doing a get_task_struct while holding it
> > sufficient to pin it after you drop the lock?
> >
> > IOW, could the refcount on the task_struct do a 0->1 transition here and
> > end up being freed anyway after you've grabbed a reference?
>
> Good point, I thought getting a reference under he read lock would be
> enough but maybe I need more checks as I do with dentrys. I'll check
> that.
>

It looks like the rcu_read_lock is mostly there to protect the pid_hash
actually, and get_pid_task seems to do something very similar here. So,
I think you're probably fine to do what you're doing in this patch.

That said, the "What is struct pid?" comments in include/linux/pid.h
are interesting. I wonder if my comments on your original patch were
actually unfounded. If you hold a reference to a pid_t, that might be
enough to ensure that it doesn't get reused, but I'm not sure at that
point if it could end up being detached from the task.

I suspect that pinning the actual task like you're doing here is
probably the right thing to do, but I'd certainly value input from
someone who understands the task/pid interaction better than I do.

--
Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/