Re: [RFC][PATCH 0/3] fork: Add the ability to create tasks with given pids

From: Pedro Alves
Date: Fri Nov 25 2011 - 17:36:53 EST


On Friday 25 November 2011 17:03:26, Pavel Emelyanov wrote:
> On 11/25/2011 08:54 PM, Oleg Nesterov wrote:
> > How you can restore the multithread tracee?
>
> Don't know :) But if this approach sounds promising (I see, that now it's not, but...) I
> can think more on it.
>
> > You need to unreserve/reserve the previous pid, and we have the same problems again, no?
>
> With the existing patch - yes, but as I said above - we need to decide which direction to
> go and then I'll think further.

Thanks for thinking about all this. Being able to reserve pids would be
nice, but I won't pretend to know the kernel's internals enough to be able
to suggest a sane and acceptable way to do it. We'd have to be able
to restore multi-threaded tracees (which would also mean that there are
pids which leaders and others which are clones), and, we'd have to support
a single-threaded tracer debugging (and spawning) more than one process,
while not all tracees are involved in C/R. Maybe this (reservation) issue
should be be considered an orthogonal mechanism for now.

> By now your opinion is to better stay where we are ;) but if moving is unavoidable, then
> it's better to take the CLONE_CHILD_USEPIDS route. That's my position as well.

>From the perspective of a client that is
going to use this on a live system, CLONE_CHILD_USEPIDS seems a little better,
in that the pid race is only against another task reusing the same pid,
while with setting last_pid, you have a try/whoops-not-the-pid-I-want/kill/retry/rinse/repeat/
loop racing against all fork/clone's in the system, along with possibly
needing to first to do a kill(PID, 0) to check whether the PID is
available (unless setting last_pid already detects that).

BTW, it's not only GDB that would want this for live systems.
Check out Berkeley Lab's C/R (https://ftg.lbl.gov/projects/CheckpointRestart/),
where these guys use mixed kernel/userspace C/R in clusters for high-end
scientific computing to e.g., migrate tasks between nodes, and pause/resume
parallel MPI jobs (on live systems). (Apologies if everyone already knows
about this :-) .)

>From what I read from their papers, in their approach, from userspace, they
spawn new children as usual, with whatever pids the kernel wants, and then
afterwards (from userspace, but through a kernel module), magically change
the process and threads's pids to the pids they really want. They also fixup
the parent pids, and session ids after the fact, along the way.

--
Pedro Alves
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/