Re: [PATCH v7 1/2] fork: extend clone3() to support setting a PID

From: Eric W. Biederman
Date: Mon Nov 11 2019 - 18:09:23 EST


Christian Brauner <christian.brauner@xxxxxxxxxx> writes:

> On Mon, Nov 11, 2019 at 04:40:28PM +0100, Adrian Reber wrote:
>> On Mon, Nov 11, 2019 at 04:25:15PM +0100, Oleg Nesterov wrote:
>> > On 11/11, Adrian Reber wrote:
>> > >
>> > > v7:
>> > > - changed set_tid to be an array to set the PID of a process
>> > > in multiple nested PID namespaces at the same time as discussed
>> > > at LPC 2019 (container MC)
>> >
>> > cough... iirc you convinced me this is not needed when we discussed
>> > the previous version ;) Nevermind, probably my memory fools me.
>>
>> You are right. You suggested the same thing and we didn't listen ;)
>>
>> > So far I only have some cosmetic nits,
>>
>> Thanks for the quick review. I will try to apply your suggestions.
>>
>> > > @@ -175,6 +187,18 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>> > >
>> > > for (i = ns->level; i >= 0; i--) {
>> > > int pid_min = 1;
>> > > + int t_pos = 0;
>> > ^^^^^
>> >
>> > I won't insist, but I'd suggest to cache set_tid[t_pos] instead to make
>> > the code a bit more simple.
>> >
>> > > @@ -186,12 +210,24 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>> > > if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
>> > > pid_min = RESERVED_PIDS;
>> >
>> > You can probably move this code into the "else" branch below.
>> >
>> > IOW, something like
>> >
>> >
>> > for (i = ns->level; i >= 0; i--) {
>> > int xxx = 0;
>> >
>> > if (set_tid_size) {
>> > int pos = ns->level - i;
>> >
>> > xxx = set_tid[pos];
>> > if (xxx < 1 || xxx >= pid_max)
>> > return ERR_PTR(-EINVAL);
>> > /* Also fail if a PID != 1 is requested and no PID 1 exists */
>> > if (xxx != 1 && !tmp->child_reaper)
>> > return ERR_PTR(-EINVAL);
>> > if (!ns_capable(tmp->user_ns, CAP_SYS_ADMIN))
>> > return ERR_PTR(-EPERM);
>> > set_tid_size--;
>> > }
>> >
>> > idr_preload(GFP_KERNEL);
>> > spin_lock_irq(&pidmap_lock);
>> >
>> > if (xxx) {
>> > nr = idr_alloc(&tmp->idr, NULL, xxx, xxx + 1,
>> > GFP_ATOMIC);
>> > /*
>> > * If ENOSPC is returned it means that the PID is
>> > * alreay in use. Return EEXIST in that case.
>> > */
>> > if (nr == -ENOSPC)
>> > nr = -EEXIST;
>> > } else {
>> > int pid_min = 1;
>> > /*
>> > * init really needs pid 1, but after reaching the
>> > * maximum wrap back to RESERVED_PIDS
>> > */
>> > if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
>> > pid_min = RESERVED_PIDS;
>> > /*
>> > * Store a null pointer so find_pid_ns does not find
>> > * a partially initialized PID (see below).
>> > */
>> > nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
>> > pid_max, GFP_ATOMIC);
>> > }
>> >
>> > ...
>> >
>> > This way only the "if (set_tid_size)" block has to play with set_tid_size/set_tid.
>> >
>> > note also that this way we can easily allow set_tid[some_level] == 0, we can
>> > simply do
>> >
>> > - if (xxx < 1 || xxx >= pid_max)
>> > + if (xxx < 0 || xxx >= pid_max)
>> >
>> > although I don't think this is really useful.
>>
>> Yes. I explicitly didn't allow 0 as a PID as I didn't thought it would
>> be useful (or maybe even valid).

I agree not allowing 0 sounds very reasonable.

> How do you express: I don't care about a specific pid in pidns level
> <n>, just give me a random one? For example,
>
> set_tid[0] = 1234
> set_tid[1] = 5678
> set_tid[2] = random_pid()
> set_tid[3] = 9
>
> Wouldn't that be potentially useful?

I can't imagine how.

At least in my head the fundamental concept is picking up a container on
one machine and moving it to another machine. For that operation you
will know starting with the most nested pid namespace the pids that you
want up to some point. Farther up you don't know.

I can't imagine in what scenario you would not know a pid at outer level
but want a specified pid at an ever farther removed outer level. What
scenario are you thinking about that could lead to such a situation?

For the me the question is: Are you restoring what you know or not?

Eric