Re: Using cpusets for configuration/isolation [Was Re: RT sched:cpupri_vec lock contention with def_root_domain and no load balance]

From: Max Krasnyansky
Date: Tue Nov 18 2008 - 21:00:16 EST


Nish Aravamudan wrote:
Perhaps this is not a welcome comment, but I have been wondering this
as I spent some time playing with CPU isolation. Are cpusets the right
interface for system configuration?

It seems to me that, and the Documentation agrees with me, that
cpusets are designed around tasks and constraining in various ways
what system resources the tasks have. But may not have been originally
designed around the configuration of the system resources itself at
the system level. Now obviously these constraints will have
interactions with things like CPU hotplug, sched domains, etc. But it
does not seem obvious to me that cpusets *should* be the recommended
way to achieve isolation.

It *almost* makes sense to me to have a separate interface for system
configuration, perhaps in a system filesystem ... say sysfs :) ...
that could be used to indicate a given CPU should be isolated from the
remainder of the system. It could take the form of a file just like
"online", perhaps called "isolated". But rather than go all the way
through the hotplug sequence as writing to "online" does, it just goes
"through the motions" and then brings the CPU back up. In fact, we
could do more than we do with cpusets-based isolation, like removing
workqueues and stop machine. We would have an isolated_map (I guess)
that corresponds to those CPUs with isolated=1 and provide that list
in /sys/devices/system/cpu like the online file.

Or perhaps it makes more sense to present a filesystem *just* for
system partitioning (partfs?). The root directory would have all the
CPUs (for now, perhaps memory should be there too) and administrators
could create isolated groups of CPUs. But we wouldn't present a
transparent way to assign tasks to isolated CPUs (the tasks file) and
the root directory would automatically lose CPUs placed in its
subdirectories. Perhaps the latter is supported in cpusets by the
cpu_exclusive flag, but let me just say the Documentation is pretty
bad. The only reference to what this flag does:

" - cpu_exclusive flag: is cpu placement exclusive?"

I can't tell exactly what the author means by exclusive here.

This feels like something I read Max K. proposing a while ago, and I'm
sorry if it has already been Nak'd then. It just feels like we're
shoehorning system configuration into cpusets in a way that isn't the
most straightforward, when we have an existing system layout that
should work or could design one that is sane.

What you described is almost exactly what I did in my original cpu isolation patch, which did get NAKed :). Basically I used global cpu_isolated_map and exposed 'isolated' bit, etc.

I do not see how 'partfs' that you described would be different from 'cpusets' that we have now. Just ignore 'tasks' files in the cpusets and you already have your 'partfs'. You do _not_ have to use cpuset for assigning tasks if you do not want to. Just use them to define sets of cpus and keep all the tasks in the 'root' set. You can then explicitly pin your threads down with pthread_set_affinity().

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/