Re: [RFC][PATCH 0/3] fork: Add the ability to create tasks with givenpids

From: Pavel Emelyanov
Date: Mon Nov 28 2011 - 05:39:08 EST


On 11/27/2011 10:47 PM, Tejun Heo wrote:
> Hello, Pavel.
>
> On Fri, Nov 25, 2011 at 02:14:56PM +0400, Pavel Emelyanov wrote:
>> OK, here's another proposal that seem to suit all of us:
>>
>> 1. me wants to clone tasks with pids set
>> 2. Pedro wants to fork task with not changing pids and w/o root perms
>> 3. Oleg and Tejun want to have little intrusion into fork() path
>>
>> The proposal is to implement the PR_RESERVE_PID prctl which allocates and puts a
>> pid on the current. The subsequent fork() uses this pid, this pid survives and keeps
>> its bit in the pidmap after detach. The 2nd fork() after the 1st task death thus
>> can reuse the same pid again. This basic thing doesn't require root perms at all
>> and safe against pid reuse problems. When requesting for pid reservation task may
>> specify a pid number it wants to have, but this requires root perms (CAP_SYS_ADMIN).
>>
>> Pedro, I suppose this will work for your checkpoint feature in gdb, am I right?
>>
>> Few comments about intrusion:
>>
>> * the common path - if (pid != &init_struct_pid) - on fork is just modified
>> * we have -1 argument to copy_process
>> * one more field on struct pid is OK, since it size doesn't change (32 bit level is
>> anyway not required, it's OK to reduce on down to 16 bits)
>> * no clone flags extension
>> * no new locking - the reserved pid manipulations happen under tasklist_lock and
>> existing common paths do not require more of it
>> * yes, we have +1 member on task_struct :(
>>
>> Current API problems:
>>
>> * Only one fork() with pid at a time. Next call to PR_RESERVE_PID will kill the
>> previous reservation (don't know how to fix)
>> * No way to fork() an init of a pid sub-namespace with desired pid in current
>> (can be fixed for a flag for PR_RESERVE_PID saying that we need a pid for a
>> namespace of a next level)
>> * No way to grab existing pid for reserve (can be fixed, if someone wants this)
>>
>> Oleg, Tejun, do you agree with such an approach?
>
> Hmmm... Any attempt to reserve PIDs without full control over the
> namespace is futile. It can never be complete / reliable.

Why? What's the _real_ problem with the

pid = prctl(PR_RESERVE_PID, 0); /* let the kernel _generate_ a pid for us */
while (1) {
real_pid = fork();
BUG_ON(pid != real_pid);
if (real_pid == 0)
return do_child();

wait();
}

model? Let's temporarily forget about the single reserved pid implementation
limitation and concentrate on the approach itself.

> Let's just
> forget about it. If anyone, including gdb, wants to have fun with CR,
> let them manage namespace too; otherwise, it's never gonna be
> reliable.
>
> If you take the above out, setting last_pid is as simple as it gets
> and good enough. It's essentially few tens of lines of code to add
> userland interface for setting one pid_t value. Let's restrict
> manipulation to root for now and see whether finer grained CAP_* makes
> sense as we go along.

That's OK for me, I'll send the patches soon, but I'd like to hear for some sane
explanation of the above.

> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/