Re: [PATCH] pidfd: add NSpid entries to fdinfo

From: Christian Brauner
Date: Sat Oct 12 2019 - 06:21:38 EST


On Sat, Oct 12, 2019 at 12:19:22PM +0200, Christian Brauner wrote:
> Currently, the fdinfo file of contains the field Pid:
> It contains the pid a given pidfd refers to in the pid namespace of the
> opener's procfs instance.
> If the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its pid. This is
> similar to calling getppid() on a process who's parent is out of it's
> pid namespace (e.g. when moving a process into a sibling pid namespace
> via setns()).
>
> Add an NSpid field for easy retrieval of the pid in all descendant pid
> namespaces:
> If pid namespaces are supported this field will contain the pid a given
> pidfd refers to for all descendant pid namespaces starting from the
> current pid namespace of the opener's procfs instance, i.e. the first
> pid entry for Pid and NSpid will be identical.
> If the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its first NSpid and
> no other NSpid entries will be shown.
> Note that this differs from the Pid and NSpid fields in
> /proc/<pid>/status where Pid and NSpid are always shown relative to the
> pid namespace of the opener's procfs instace. The difference becomes
> obvious when sending around a pidfd between pid namespaces from
> different trees, i.e. where no ancestoral relation is present between
> the pid namespaces:
> 1. sending around pidfd:
> - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> (Also take care to create new mount namespaces in the new pid
> namespace and mount procfs.)
> - create a process with a pidfd in ns1
> - send pidfd from ns1 to ns2
> - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
> are 0
> - create a process with a pidfd in
> - open a pidfd for a process in the initial pid namespace
> 2. sending around /proc/<pid>/status fd:
> - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> (Also take care to create new mount namespaces in the new pid
> namespace and mount procfs.)
> - create a process in ns1
> - open /proc/<pid>/status in the initial pid namespace for the process
> you created in ns1
> - send statusfd from initial pid namespace to ns2
> - read statusfd and observe:
> - that Pid will contain the pid of the process as seen from the init
> - that NSpid will contain the pids of the process for all descendant
> pid namespaces starting from the initial pid namespace
>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: linux-api@xxxxxxxxxxxxxxx
> Co-Developed-by: Christian Kellner <christian@xxxxxxxxxx>
> Signed-off-by: Christian Kellner <christian@xxxxxxxxxx>
> Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>

I think this might be more what we want.
I tried to think of cases where the first entry of Pid is not identical
to the first entry of NSpid but I came up with none. Maybe you do, Jann?

Christian, this is just a quick stab I took. Feel free to pick this up
as a template.

Thanks!
Christian

> ---
> kernel/fork.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 72 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 1f6c45f6a734..b155bad92d9c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1695,12 +1695,83 @@ static int pidfd_release(struct inode *inode, struct file *file)
> }
>
> #ifdef CONFIG_PROC_FS
> +/**
> + * pidfd_show_fdinfo - print information about a pidfd
> + * @m: proc fdinfo file
> + * @f: file referencing a pidfd
> + *
> + * Pid:
> + * This function will print the pid a given pidfd refers to in the pid
> + * namespace of the opener's procfs instance.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its pid. This is
> + * similar to calling getppid() on a process who's parent is out of it's
> + * pid namespace (e.g. when moving a process into a sibling pid namespace
> + * via setns()).
> + *
> + * NSpid:
> + * If pid namespaces are supported then this function will also print the
> + * pid a given pidfd refers to for all descendant pid namespaces starting
> + * from the current pid namespace of the opener's procfs instance, i.e. the
> + * first pid entry for Pid and NSpid will be identical.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its first NSpid and
> + * no other NSpid entries will be shown.
> + * Note that this differs from the Pid and NSpid fields in
> + * /proc/<pid>/status where Pid and NSpid are always shown relative to the
> + * pid namespace of the opener's procfs instace. The difference becomes
> + * obvious when sending around a pidfd between pid namespaces from
> + * different trees, i.e. where no ancestoral relation is present between
> + * the pid namespaces:
> + * 1. sending around pidfd:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> + * (Also take care to create new mount namespaces in the new pid
> + * namespace and mount procfs.)
> + * - create a process with a pidfd in ns1
> + * - send pidfd from ns1 to ns2
> + * - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
> + * are 0
> + * - create a process with a pidfd in
> + * - open a pidfd for a process in the initial pid namespace
> + * 2. sending around /proc/<pid>/status fd:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> + * (Also take care to create new mount namespaces in the new pid
> + * namespace and mount procfs.)
> + * - create a process in ns1
> + * - open /proc/<pid>/status in the initial pid namespace for the process
> + * you created in ns1
> + * - send statusfd from initial pid namespace to ns2
> + * - read statusfd and observe:
> + * - that Pid will contain the pid of the process as seen from the init
> + * - that NSpid will contain the pids of the process for all descendant
> + * pid namespaces starting from the initial pid namespace
> + */
> static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
> {
> struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
> struct pid *pid = f->private_data;
> + pid_t nr = pid_nr_ns(pid, ns);
> +
> + seq_put_decimal_ull(m, "Pid:\t", nr);
> +
> +#ifdef CONFIG_PID_NS
> + seq_puts(m, "\nNSpid:");
> + if (nr == 0) {
> + /*
> + * If nr is zero the pid namespace of the procfs and the
> + * pid namespace of the pidfd are neither the same pid
> + * namespace nor are they ancestors. Since NSpid and Pid
> + * are always identical in their first entry shortcut it
> + * and simply print 0.
> + */
> + seq_put_decimal_ull(m, "\t", nr);
> + } else {
> + int i;
> + for (i = ns->level; i <= pid->level; i++)
> + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, pid->numbers[i].ns));
> + }
> +#endif
>
> - seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
> seq_putc(m, '\n');
> }
> #endif
> --
> 2.23.0
>