Re: [PATCH] cpuset and sched domains: sched_load_balance flag

From: Paul Jackson
Date: Sun Sep 30 2007 - 14:08:20 EST

Nick wrote:
> The user should just be able to specify exactly the partitioning of
> tasks required, and cpusets should ask the scheduler to do the best
> job of load balancing possible.

If the cpusets which have 'sched_load_balance' enabled are disjoint
(their 'cpus' cpus_allowed masks don't overlap) then you get exactly
what you're asking for. In that case there is exactly one sched domain
for the 'cpus' allowed by each cpuset that has sched_load_balanced

But there is another case in which one does not want what you ask for.

That case involves the situation where one is running a third part
batch scheduler on part of ones big system, and doing other stuff
(perhaps Ingo's realtime stuff) on another part of the system.

In that case, the system admin will be advised to turn off
sched_load_balance on the top cpuset. But in that case the system
admin will -not- know from moment to moment what jobs the batch
scheduler is running on the cpus assigned to its control. Only the
batch scheduler knows that.

The batch scheduler is code that was written by someone else, in
some other company, some other time. That code does not get to
control the overall sched domain partitioning of the entire system.
The batch scheduler gets to say, in affect:

Here's where I need load balancing to occur, in the normal fashion,
and here's where I don't need it.

In short, you insisting that only a single administrative point of
control determine the systems sched domains. Sometimes that fits
the way the system is managed, and my patch lets you do that. But
sometimes this is a shared responsibility, between a piece of third
party software and the system admin, and my patch allows for that
case as well.

This is a typical sort of situation that arises from having hierarchical
cpuset definitions, and highlights the reason (and the use case,
involving third party batch schedulers) that I went with a hierarchical
cpuset architecture in the first place.

I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at