Re: Using cpusets for configuration/isolation [Was Re: RT sched:cpupri_vec lock contention with def_root_domain and no load balance]

From: Max Krasnyansky
Date: Wed Nov 19 2008 - 00:14:31 EST


Nish Aravamudan wrote:
> On Tue, Nov 18, 2008 at 5:59 PM, Max Krasnyansky <maxk@xxxxxxxxxxxx> wrote:
>> I do not see how 'partfs' that you described would be different from
>> 'cpusets' that we have now. Just ignore 'tasks' files in the cpusets and you
>> already have your 'partfs'. You do _not_ have to use cpuset for assigning
>> tasks if you do not want to. Just use them to define sets of cpus and keep
>> all the tasks in the 'root' set. You can then explicitly pin your threads
>> down with pthread_set_affinity().
>
> I guess you're right. It still feels a bit kludgy, but that is probably just me.
>
> I have wondered, though, if it makes sense to provide an "isolated"
> file in /sys/devices/system/cpu/cpuX/ to do most of the offline
> sequence, break sched_domains and remove a CPU from the load balancer
> (rather than turning the load balancer off), rather than requiring a
> user to explicitly do an offline/online.
I do not see any benefits in exposing a special 'isolated' bit and have it do
the same thing that the cpu hotplug already does. As I explained in other
threads cpu hotplug is a _perfect_ fit for the isolation purposes. In order to
isolate a CPU dynamically (ie at runtime) we need to flush pending work, flush
chaches, move tasks and timers, etc. Which is _exactly_ what cpu hotplug code
does when it brings CPU down. There is no point in reimplementing it.

btw It sounds like you misunderstood the meaning of the
cpuset.sched_load_balance flag. It's does not turn really turn load balancer
off, it simply causes cpus in different cpusets to be put into separate sched
domains. In other words it already does exactly what you're asking for.

> I guess it can all be rather
> transparently masked via a userspace tool, but we don't have a common
> one yet.
I do :). It's called 'syspart'
http://git.kernel.org/?p=linux/kernel/git/maxk/syspart.git;a=summary
I'll push an updated version in a couple of days.

> I do have a question, though: is your recommendation to just turn the
> load balancer off in the cpuset you create that has the isolated CPUs?
> I guess the conceptual issue I was having was that the root cpuset (I
> think) always contains all CPUs and all memory nodes. So even if you
> put some CPUs in a cpuset under the root one, and isolate them using
> hotplug + disabling the load balancer in that cpuset, those CPUs are
> still available to tasks in the root cpuset? Maybe I'm just missing a
> step in the configuration, but it seems like as long as the global
> (root cpuset) load balancer is on, a CPU can't be guaranteed to stay
> isolated?
Take a look at what 'syspart' does. In short yes, of course we need to set
sched_load_balance flag in root cpuset to 0.

Max





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/