Re: (pspace,pid) vs true pid virtualization

From: Hubertus Franke
Date: Thu Feb 16 2006 - 22:35:34 EST


Serge E. Hallyn wrote:
Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):

"Serge E. Hallyn" <serue@xxxxxxxxxx> writes:
With respect to pids lets not get caught up in the implementation
details. Let's first get clear on what the semantics should be.

- Should the first pid in a pid space have pid 1?
that should only be required if you have system containers
or if there are tools or requirements that if I wonderup my process tree
that I ultimately must end up at 1.

- Should pid == 1 ignore signals, it doesn't have a handler for?

Yes

- Should any children of pid 1 be allowed to live when pid == 1 is killed?
No .. it has init semantics !



But doesn't that depend on whether we use (pspace,pid) or vpids? If
vpids, then init isn't really a problem, since from kernelspace
processes in a comtainer stil have a global pid and global parent, and
init knows them. If (pspace,pid), then we need a fakeinit bc the real
init doesn't know about the processes in the container...


- Should a process have some sort of global (on the machine identifier)?

First establish whether that global ID has to be persistent ...
I don't see why ! In which case the TASK_REF is the perfect global ID.


I think to satisfy openvz existing customers this must be a yes. With
vpid the answer is simple. With (pspace,pid), there are three anwers i've
heard, namely

1. just use pspaceid, pid
2. make pspaceid small and use (pspaceid << SOMEBITS | pid)
3. use pid1/pid2/pid3 where pid1 is creator of pid and its
pspace, etc...
This implies that pid2 can be looked up in the context of pid1.
In OpenVZ approach that's possible, In pspaces.. isn't that the wpid ?


But the openvz guys also don't want userspace tool changes, making (2)
the most likely option. Any other ideas?


- Should the pids in a pid space be visible from the outside?


Again, the openvz guys say yes.

I think it should be acceptable if a pidspace is visible in all it's
ancestor pidspaces. I.e. if I create pspace2 and pspace3 from pid 234
in pspace1, then pspace2 doesn't need to be able to address pspace3
and vice versa.

That means you need to do a more complicated lookup ! for instance let's say you have hierarchy
pspace1
|--->pspace2
| |--->pspace2a
|--->pspace3
|--->pspace3a

let's assume we use the (pspaceid<<BITS | pid ) global id. To verify I have to
ensure that the target pid can reach the originating pid in its ancestor path.
Not a biggy as these pspace trees probably won't get much deeper then 3 or 4.


Kirill, is that acceptable?


- Should the parent of pid 1 be able to wait for it for it's children?

Yes.

Yes ... VPID does that and wpid in pspace does that as well.


- Is a completely disjoin pid space acceptable to anyone?


To anyone? yes :)

TO everyone, I don't think so.

hehh... yes they should be disjoined other then at the top
where we want to wait ..

- What should the parent of pid == 1 see?

- Should a process not in the default pid space be able to create another pid space?


Yes.

How else do you get hierarchy ....


This is to support using pidspaces for vservers, and creating
migrateable sub-pidspaces in each vserver.


- Should we be able to monitor a pid space from the outside?


To some extent, yes.


- Should we be able to have processes enter a pid space?


IMO that is crucial.

Existing ones .. now that is potentially difficult to do. Particular
if you want to enter a pidspace that has already been migrated.
Because ones assigned pid might already been taken in the target pspace.



- Do we need to be able to be able to ptrace/kill individual processes
in a pid space, from the outside, and why?


I think this is completely unnecessary so long as a process can enter a
pidspace.


- After migration what identifiers should the tasks have?


It must be possible to retain the same pids, at least from inside the
container.

Absolutely .. otherwise all cashed pids in userspace are meaningless.


So this is irrelevant, as the openvz approach can just virtualize the
old pid, while (pspace, pid) will be able to create a new container and
use the old pid values, which are then guaranteed to not be in use.

Exactly .. mute issue, this is "trivial" as long as you can fork with
a particular pid used.



If we can answer these kinds of questions we can likely focus in
on what the implementation should look like. So far I have not
seen a question that could not be implemented with a (pspace, pid)/pid
or a vpid/pid implementation.


But you have, haven't you? Namely, how can openvz provide it's
customers with a global view of all processes without putting 5 years of
work into a new sysadmin interface?

-serge



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/