Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Paul Jackson
Date: Sat Aug 07 2004 - 01:13:40 EST


Erich Focht wrote:
> we (NEC) are also a potential user of this patch

Good - welcome.


> I think cpusets and CKRM should be
> made to come together. One of CKRM's user interfaces is a filesystem
> with the file-tree representing the class hierarchy. It's the same for
> cpusets.

Hmmm ... this suggestion worries me, for a couple of reasons.

Just because cpusets and CKRM both have a hierarchy represented in a
file system doesn't mean it is, or can be, the same file system. Not
all trees are the same.

Perhaps someone more expert in CKRM can help here. The cpuset hierarchy
has some strict semantics:
1) Any cpusets CPUs and Memory must be a subset of its parents.
2) A cpuset may be exclusive for CPU or Memory only if its parent is.
3) A CPU or Memory exclusive cpuset may not overlap its siblings.

See the routine kernel/cpuset.c:validate_change() for the exact
coding of these rules.

If we followed your suggestion, Erich, would these rules still hold?
I can't imagine that the CKRM folks have any existing hierarchies with
these particular rules. They would need to if we went this way.

On the flip side, what additional rules, if any, would CKRM impose
on this hierarchy?

The other reason that this suggestion worries me is a bit more
philosophical. I'm sure that for all the other, well known,
resources that CKRM manages, no one is proposing replacing whatever
existing names and mechanisms exist for those resources, such as
bandwidth, compute cycles, memory, ... Rather I presume that CKRM
provides an additional resource management layer on top of the
existing resources, which retain their classic names and apparatus.

What you seem to be suggesting here, especially with this nice
picture from your next post:

The files in cpusets are:
- cpus: list of CPUs in that cpuset
- mems: list of Memory Nodes in that cpuset
- cpu_exclusive flag: is cpu placement exclusive?
- mem_exclusive flag: is memory placement exclusive?
- tasks: list of tasks (by pid) attached to that cpuset
The files in a CKRM class directory:
- stats : statistics (not needed for cpusets)
- shares : could contain cpus, mems, cpu_exclusive, mem_exclusive
- members : same as reading /dev/cpusets/.../tasks
- target : same as writing /dev/cpusets/.../tasks

Changing the "shares" would mean something like
echo "cpus +6-10" > .../shares

would remove the cpuset specific interface forever, leaving it only
visible via a more generic "shares, members, target" interface suitable
for abstract resource management.

I am afraid that this would make it harder for new users of cpusets to
figure them out. Just cpusets by themselves add a new and strange layer
of abstraction, that will require a little bit of head scratching (as
Martin Bligh can testify to, from recent experience ;) for those
administering and managing the big iron where cpusets will be useful.

To add yet another layer of abstractions on top of that, from the CKRM
world, might send quite a few users into mental overload, doing the
usual stupid things we all do when we have given up on understanding and
are just thrashing about, trying to get something to work.

I think we are onto something useful here, the hierarchical organizing
of compute resources of CPU and Memory, which will become increasingly
relevant in the coming years, with bigger machines and more complex
compute and memory architectures.

I'd hate to see cpusets hidden behind resource management terms from day
one.

And, looking at it from the CKRM side (not sure I can, I'll try ...)
would it not seem a bit odd to a CKRM user that just one of the resource
types managed, these cpusets, had no apparent existence outside of the
CKRM hierarchy, unlike all the other resources, which existed a priori,
and, I presume, continue their independent existance?

Obviously, I could use a little CKRM expertise here.

But my inclination is to continue to view these two projects as separate,
with the potential that CKRM will someday add cpusets to the resource types
that it can manage.

Thank-you.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/