Re: [RFC PATCH 3/8] kmod - teach call_usermodehelper() to use a namespace

From: Oleg Nesterov
Date: Mon Feb 09 2015 - 12:04:43 EST


On 02/09, Ian Kent wrote:
>
> On Sun, 2015-02-08 at 20:00 +0100, Oleg Nesterov wrote:
> > > +
> > > + this = file_open_root(mnt->mnt_root, mnt, path, O_RDONLY);
> > > + if (unlikely(IS_ERR(this))) {
> > > + err = PTR_ERR(this);
> > > + break;
> > > + }
> > > +
> > > + err = setns_inode(file_inode(this), 0);
> > > + fput(this);
> > > + if (err)
> > > + break;
> > > + }
> > > +
> > > + return err;
> > > +}
> >
> > Yes, I need to actually read this series and setns paths, but at first glance
> > there must be a simpler method to call ops->install's and switch_task_namespaces.
>
> Yes, the namespaces implementation does seem a bit strange in this
> respect. I mentioned that concern the first time I posted these. But I'm
> still not that clear on the big picture of how namespace are meant to
> work.
>
> It's not just access to ops->install() that's the problem.
>
> For each of the individual namespaces we open a file handle, to get
> access to ops->install() for that namespace, install it, drop "all" the
> namespaces then replace them with the new set that essentially has one
> namespace changed.

I understand. but I still can't understand why we can't implement something
like
enter_ns(struct nsproxy *p)
{
new_nsproxy = create_new_namespaces(...);

p->mnt_ns->ns->ops->install(new_nsproxy, ...);
p->pid_ns_for_children->ns->ops->install(new_nsproxy, ...);
...

switch_task_namespaces(new_nsproxy);
}

Why we should abuse fs/proc ?

See also below.

> > Sorry if this was already discussed before, but to me it looks a bit strange
> > to abuse /proc/ files for this. And again, iiuc file_open_root() can fail if
> > tsk has already exited (init can be multithreaded).
>
> Not sure that the failure is a problem though as long as it's handled
> since, if the init process of the container is gone (or will be gone
> once were done), so is the container and the caller.

Not really. Individual thread can exit while the whole "init" process can be alive.
In particular the main thread can exit and become a zombie, so find_task_by_vpid(1)
can't work in general.

You can probably use task_active_pid_ns()-child_reaper, but again I do not think
you should pass "task_struct *" to enter_ns().


And. Whatever we do, ops->install() or setns_inode() can't solve the problem with
pid_ns. You need the additional clone() to "activate" it. pidns_install() does not
actually change task_active_pid_ns().

> but in a container we want to use
> the init process of the container

Yes sure, I understand. But see above.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/