Re: scheduler scalability - cgroups, cpusets andload-balancing

From: Gregory Haskins
Date: Tue Jan 29 2008 - 12:28:21 EST


>>> On Tue, Jan 29, 2008 at 11:51 AM, in message
<20080129105104.d70f36ef.pj@xxxxxxx>, Paul Jackson <pj@xxxxxxx> wrote:
> Gregory wrote:
>> This is correct. We have the balance policy polymorphically associated
>> with each sched_class, and the CFS load-balancer and RT "load" (really,
>> priority) balancer can coexist together at the same time and across
>> arbitrary #s of cores
>
> So ... we have the option of having all sched_classes coexist
> polymorphically.
>
> That I didn't realize until this thread.

Its on a per-task basis when the task elects SCHED_FIFO/RR/BATCH/OTHER, etc. If the task is on a particular RQ, the RQ operates under the policy of that class. There are some cases where the RQ consults the policy of all classes, but they are still influenced by whether there are actual tasks running within the scope of the current cpuset (or root-domain).

>
> Now ... do we -want- to ?)

I think so, yes. But I will give the disclaimer that I don't fully understand your world ;)

You could certainly create a group of cpus with homogeneous policy by creating a cpuset with only tasks of a single class as members. But likewise, if you populate a cpuset with tasks from mixed classes, you have mixed balance policy affecting those cpus.


>
> That is, what is the easiest kernel-user API to work with and understand?
>
> Is it one where we essentially expose sched_class to user space, and let
> them pick their sched_class, or pick none of the above (don't balance)?

IMHO it works well the way it is: The user selects the class for a particular task using sched_setscheduler(), and they select the cpuset (or inherit it) that defines its execution scope. If that scope has balancing enabled, the policy for the member classes is in effect.

(on this topic, note that I do not know if the RT-balancer will respect the cpuset concept of "balance-enabled" anyway. That might have to be fixed)

Again, the disclaimer that I do not have expertise in your area, so perhaps this is naive.

>
> Or is it one where, other than the special case my batch schedulers need
> to not balance at all, we expose nothing more to user space, and provide
> all sched_class load balancers to all sched_domains (other than those
> not balanced at all)?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/