Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches

From: Gerrit Huizenga
Date: Thu Dec 15 2005 - 22:29:04 EST



On Thu, 15 Dec 2005 18:20:52 PST, Matt Helsley wrote:
> On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote:
> > On Thu, 15 Dec 2005 09:35:57 EST, Hubertus Franke wrote:
> > > PID Virtualization is based on the concept of a container.
> > > The ultimate goal is to checkpoint/restart containers.
> > >
> > > The mechanism to start a container
> > > is to 'echo "container_name" > /proc/container' which creates a new
> > > container and associates the calling process with it. All subsequently
> > > forked tasks then belong to that container.
> > > There is a separate pid space associated with each container.
> > > Only processes/task belonging to the same container "see" each other.
> > > The exception is an implied default system container that has
> > > a global view.
>
> <snip>
>
> > I think perhaps this could also be the basis for a CKRM "class"
> > grouping as well. Rather than maintaining an independent class
> > affiliation for tasks, why not have a class devolve (evolve?) into
> > a "container" as described here. The container provides much of
> > the same grouping capabilities as a class as far as I can see. The
> > right information would be availble for scheduling and IO resource
> > management. The memory component of CKRM is perhaps a bit tricky
> > still, but an overall strategy (can I use that word here? ;-) might
> > be to use these "containers" as the single intrinsic grouping mechanism
> > for vserver, openvz, application checkpoint/restart, resource
> > management, and possibly others?
> >
> > Opinions, especially from the CKRM folks? This might even be useful
> > to the PAGG folks as a grouping mechanism, similar to their jobs or
> > containers.
> >
> > "This patchset solves multiple problems".
> >
> > gerrit
>
> CKRM classes seem too different from containers to merge the two
> concepts:

I agree that the implementation of pid virtualization and classes have
different characteristics. However, you bring up interesting points
about the differences... But I question whether or not they are
relevent to an implementation of resource management. I'm going out
on a limb here looking at a possibly radical change which might
simplify things so there is only one grouping mechanism in kernel.
I could be wrong but...

> - Classes don't assign class-unique pids to tasks.

What part of this is important to resource management? A container
ID is like a class ID. Yes, I think container ID's are assigned to
processes rather than tasks, but is that really all that important?

> - Tasks can move between classes.

In the pid virtualization, I would think that tasks can move between
containers as well, although it isn't all that useful for most things.
For instance, checkpoint/restart needs to checkpoint a process and all
of its threads if it wants to restart it. So there may be restrictions
on what you can checkpoint/restart. Vserver probably wants isolation
at a process boundary, rather than a task boundary. Most resource
management, e.g. Java, probably doesn't care about task vs. process.

> - Tasks move between classes without any need for checkpoint/restart.

That *should* be possible with a generalized container solution.
For instance, just like with classes, you have to move things into
containers in the first place. And, you could in theory have a classification
engine that helped choose which container to put a task/process in
at creation/instantiation/significant event...

> - Classes show up in a filesystem interface rather that using a file
> in /proc to create them. (trivial interface difference)

Yep - there will probably be a /proc or /configfs interface to containers
at some point, I would expect. No significant difference there.

> - There are no "visibility boundaries" to enforce between tasks in
> different classes.

Are there in virtualized pids? There *can* be - e.g. ps can distinguish,
but it is possible for tasks to interact across container boundaries.
Not ideal for vserver, checkpoint/restart, for instance (makes c/r a
little harder or more limited - signals heading outside the container
may "disappear" when you checkpoint/restart but for apps that c/r, that
probably isn't all that likely).

> - Classes are hierarchial.

Conceptually they are. But are they in the CKRM f series? I thought
that was one area for simplification. And, how important is that *really*
for most applications?

> - Unless I am mistaken, a container groups processes (Can one thread run
> in container A and another in container B?) while a class groups tasks.
> Since a task represents a thread or a process one thread could be in
> class A and another in class B.

Definitely useful, and one question is whether pid virtualization is
container isolation, or simply virtualization to enable container
isolation. If it is an enabling technology, perhaps it doesn't have
that restriction and could be used either way based on resource management
needs or based on vserver or c/r needs...

Debate away... ;-)

gerrit
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/