Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Paul Jackson
Date: Fri Aug 06 2004 - 03:41:45 EST


Martin wrote:
> Sorry, it's just a little difficult to dive into a large
> patch without a higher level idea ...

No need to apologize. I welcome your review. I was well aware that the
next step for this patch was the "why would I want this ..." explanation.
Let me know if there is more I can explain.


> I don't think the kernel should have to deal with that
> [cpu and node number virtualization] stuff.

I agree, now.


> The other thing that seems to glare at me is the overlap
> between what you have here and PAGG/CKRM. Does
> either cpusets depend on PAGG/CKRM or vice versa?

None of these three, cpusets, PAGG or CKRM depend on the other, with the
possible exception that perhaps CKRM could make use of PAGG (whether it
does or not, or whether it should, I don't know - ask them).

Cpusets control _where_ a process can run and allocate. The central
construct of cpusets is essentially a "soft partition" -- a set of CPUs
and Memory Nodes. These can be arranged in a hierarchy, with names,
permissions, a couple of control bits. Tasks can be moved between
cpusets, as allowed by the permission model. You can attach a task to a
different cpuset if (1) you can access that cpuset (search permission to
some directory beneath /dev/cpuset), (2) write that cpusets "tasks"
file, and (3) have essentially kill rights on the task being placed.

Just as your basic file system provides a hierarchical model for
organizing your data files (places to put data), similarly cpusets
provides a hierarchical model for organizing the nodes on your big numa
box (places to run tasks).

CKRM tracks _how_ much of various interesting resouces tasks are using,
both measuring and limiting such usage. It provides a way to manage
some of the shared system resources, such as CPU time, memory pages, I/O
and incoming network bandwith based on user defined groups of tasks
called classes (quoting from http://ckrm.sourceforge.net/ ;). Unlike
cpusets and most of the rest of the kernel, CKRM doesn't just manage
individual tasks, one task at a time, but manages based on a dynamically
determined resource class it assigns to various kernel objects in
addition to tasks.

Cpusets provides a rich model of just the CPU and Memory resources, but
only manages tasks, using the traditional simple task pointer to a
shared reference counted object.

CKRM provides a rich structure for classifying a variety of kernel
objects, not just tasks, and managing their use, but it doesn't have a
particularly fancy model of any one of these resources (so far as I
know anyway ...).

PAGG is a just a mechanism that is useful for job accouting and resource
management. It's just the hooks - for an inescapable job container and
hooks for loadable modules to be called on key events in the life of
tasks in that container, such as fork and exit. PAGG provides a useful
mechanism for certain kinds of resource management and system accounting
modules, but is itself not a resource manager.

Hopefully, I haven't misrepresented CKRM and PAGG too much.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/