Re: [RFC] Cpuset: explicit dynamic sched domain control flags

From: Siddha, Suresh B
Date: Tue Oct 17 2006 - 22:21:56 EST


On Tue, Oct 17, 2006 at 12:18:23PM -0700, Paul Jackson wrote:
> > What happens when the job in the cpuset with no sched domain
> > becomes active? In this case, scheduler can't make use of all cpus
> > that this cpuset is allowed to use.
>
> What happens then is that the job manager marks the cpuset of this
> newly activated job as being a sched_domain.

With your patch, that will fail because there is already a cpuset defining
a sched domain and which overlaps with the one that is becoming active.

So job manager need to set/reset these flags when ever jobs in overlaping
cpusets become active/inactive. Is that where you are going with this patch?

What happens when both these jobs/cpusets are active at the same time?

>
> And if the job manager doesn't do that, and sets up a situation in
> which the scheduler domains don't line up with the active jobs, then
> they can't get scheduler load balancing across all the CPUs in those
> jobs cpusets. That's exactly what they asked for -- that's exactly
> what they got.
>
> (Actually, is that right? I thought load balancing would still occur
> at higher levels in the sched domain/group hierarchy, just not as
> often.)

Once the sched domains are partitioned, there is no interaction/scheduling
happening between those partitions.

>
> It is not the kernels job to make it impossible for user code to do
> stupid things. It's the kernels job to offer up various mechanisms,
> and let user space code decide what to do when.
>
> And, anyhow, how does this differ from overloading the cpu_exclusive
> flag to define sched domains. One can setup the same thing there,
> where a job can't balance across all its CPUs:
>
> /dev/cpuset/cs1 cpu_exclusive = 1; cpus = 0-7
> /dev/cpuset/cs1/suba cpu_exclusive = 1; cpus = 0-3
> /dev/cpuset/cs1/subb cpu_exclusive = 1; cpus = 4-7
>
> (sched_domain_enabled = 0 in all cpusets)
>
> If you put a task in cpuset "cs1" (not in one of the sub cpusets)
> then it can't load balance between CPUs 0-3 and CPUs 4-7 (or can't
> load balance as often - depending on how this works.)

hmm... tasks in "cs1" won't properly be balanced between 0-7cpus..
In this case, shouldn't we remove cpus0-3 from "cs1" cpus_allowed?

Current code makes sure that "suba" cpus are removed from "cs1" sched domain
but allows the tasks in "cs1" to have "suba" cpus. I don't know much about
how job manager interacts with cpusets but this behavior sounds bad to me.

copying Nick to get his thoughts..

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/