Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Matthew Dobson
Date: Mon Feb 07 2005 - 19:01:48 EST


Matthew Dobson wrote:
On Sun, 2004-10-03 at 16:53, Martin J. Bligh wrote:

Martin wrote:

Matt had proposed having a separate sched_domain tree for each cpuset, which
made a lot of sense, but seemed harder to do in practice because "exclusive"
in cpusets doesn't really mean exclusive at all.

See my comments on this from yesterday on this thread.

I suspect we don't want a distinct sched_domain for each cpuset, but
rather a sched_domain for each of several entire subtrees of the cpuset
hierarchy, such that every CPU is in exactly one such sched domain, even
though it be in several cpusets in that sched_domain.

Mmmm. The fundamental problem I think we ran across (just whilst pondering,
not in code) was that some things (eg ... init) are bound to ALL cpus (or
no cpus, depending how you word it); i.e. they're created before the cpusets
are, and are a member of the grand-top-level-uber-master-thingummy.

How do you service such processes? That's what I meant by the exclusive
domains aren't really exclusive.

Perhaps Matt can recall the problems better. I really liked his idea, aside
from the small problem that it didn't seem to work ;-)


Well that doesn't seem like a fair statement. It's potentially true,
but it's really hard to say without an implementation! ;)

I think that the idea behind cpusets is really good, essentially
creating isolated areas of CPUs and memory for tasks to run
undisturbed. I feel that the actual implementation, however, is taking
a wrong approach, because it attempts to use the cpus_allowed mask to
override the scheduler in the general case. cpus_allowed, in my
estimation, is meant to be used as the exception, not the rule. If we
wish to change that, we need to make the scheduler more aware of it, so
it can do the right thing(tm) in the presence of numerous tasks with
varying cpus_allowed masks. The other option is to implement cpusets in
a way that doesn't use cpus_allowed. That is the option that I am
pursuing.

My idea is to make sched_domains much more flexible and dynamic. By
adding locking and reference counting, and simplifying the way in which
sched_domains are created, linked, unlinked and eventually destroyed we
can use sched_domains as the implementation of cpusets. IA64 already
allows multiple sched_domains trees without a shared top-level domain. My proposal is to make this functionality more generally available. Extending the "isolated domains" concept a little further will buy us
most (all?) the functionality of "exclusive" cpusets without the need to
use cpus_allowed at all.

I've got some code. I'm in the midst of pushing it forward to rc3-mm2. I'll post an RFC later today or tomorrow when it's cleaned up.

-Matt

Sorry to reply a long quiet thread, but I've been trading emails with Paul Jackson on this subject recently, and I've been unable to convince either him or myself that merging CPUSETs and CKRM is as easy as I once believed. I'm still convinced the CPU side is doable, but I haven't managed as much success with the memory binding side of CPUSETs. In light of this, I'd like to remove my previous objections to CPUSETs moving forward. If others still have things they want discussed before CPUSETs moves into mainline, that's fine, but it seems to me that CPUSETs offer legitimate functionality and that the code has certainly "done its time" in -mm to convince me it's stable and usable.

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/