Re: [PATCH v1 2/2] signal: add procfd_signal() syscall

From: Serge E. Hallyn
Date: Thu Nov 22 2018 - 02:41:20 EST


On Tue, Nov 20, 2018 at 08:23:43AM +1100, Aleksa Sarai wrote:
> On 2018-11-20, Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> > On 2018-11-19, Christian Brauner <christian@xxxxxxxxxx> wrote:
> > > On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > > > On 2018-11-19, Christian Brauner <christian@xxxxxxxxxx> wrote:
> > > > > + if (info) {
> > > > > + ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > > > + if (unlikely(ret))
> > > > > + goto err;
> > > > > + /*
> > > > > + * Not even root can pretend to send signals from the kernel.
> > > > > + * Nor can they impersonate a kill()/tgkill(), which adds
> > > > > + * source info.
> > > > > + */
> > > > > + ret = -EPERM;
> > > > > + if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > > > + (task_pid(current) != pid))
> > > > > + goto err;
> > > > > + } else {
> > > > > + prepare_kill_siginfo(sig, &kinfo);
> > > > > + }
> > > >
> > > > I wonder whether we should also have a pidns restriction here, since
> > > > currently it isn't possible for a container process using a pidns to
> > > > signal processes outside its pidns. AFAICS, this isn't done through an
> > > > explicit check -- it's a side-effect of processes in a pidns not being
> > > > able to address non-descendant-pidns processes.
> > > >
> > > > But maybe it's reasonable to allow sending a procfd to a different pidns
> > > > and the same operations working on it? If we extend the procfd API to
> > >
> > > No, I don't think so. I really don't want any fancy semantics in here.
> > > Fancy doesn't get merged and fancy is hard to maintain. So we should do
> > > something like:
> > >
> > > if (proc_pid_ns() != current_pid_ns)
> > > return EINVAL
> >
> > This isn't quite sufficient. The key thing is that you have to be in an
> > *ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use
> > the check already in pidns_get_parent, and expose it. It would be
> > something as trivial as:
> >
> > bool pidns_is_descendant(struct pid_namespace *ns,
> > struct pid_namespace *ancestor)
> > {
> > for (;;) {
> > if (!ns)
> > return false;
> > if (ns == ancestor)
> > break;
> > ns = ns->parent;
> > }
> > return true;
> > }
> >
> > And you can rewrite pidns_get_parent to use it. So you would instead be
> > doing:
> >
> > if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> > return -EPERM;
>
> Scratch the last bit, -EPERM is wrong here. I would argue that -EINVAL
> is *somewhat* wrong because arguable the more semantically consistent
> error (with kill(2)) would be -ESRCH -- but then you're mixing the "pid
> is dead" and "pid is not visible to you" cases. I'm not sure what the
> right errno would be here (I'm sure some of the LKML greybeards will
> have a better clue.) :P

Actually I like EXDEV for this. ERMOTE also works.