Re: [RFC/PATCH 0/4] CPUSET driven CPU isolation

From: Max Krasnyanskiy
Date: Thu Feb 28 2008 - 12:33:22 EST


Peter Zijlstra wrote:
On Wed, 2008-02-27 at 15:38 -0800, Max Krasnyanskiy wrote:
Peter Zijlstra wrote:
My vision on the direction we should take wrt cpu isolation.
General impressions:
- "cpu_system_map" is %100 identical to the "~cpu_isolated_map" as in my patches. It's updated from different place but functionally wise it's very much the same. I guess you did not like the 'isolated' name ;-). As I mentioned before I'm not hung up on the name so it's cool :).

Ah, but you miss that cpu_system_map doesn't do the one thing the
cpu_isolated_map ever did, prevent sched_domains from forming on those
cpus.

Which is a major point.
Did you see my reply on how to support "RT balancer" with cpu_isolated_map ?
Anyway, my point was that you could've just told me something like:
"lets rename cpu_isolated_map and invert it"
That's is it. The gist of the idea is exactly the same. There is a bitmap that tells various subsystems what cpus can be used for the kernel activities.

- Updating cpu_system_map from cpusets
There are a couple of things that I do not like about this approach:
1. We lost the ability to isolate CPUs at boot. Which means slower boot times for me (ie before I can start my apps I need to create cpuset, etc). Not a big deal, I can live with it.

I'm sure those few lines in rc.local won't grow your boot time by a
significant amount of time.

That said, we should look into a replacement for the boot time parameter
(if only to decide not to do it) because of backward compatibility.
As I said I can live with it.

2. We now need another notification mechanism to propagate the updates to the cpu_system_map. That by itself is not a big deal. The big deal is that now we need to basically audit the kernel and make sure that everything affected must
have proper notifier and react to the mask changes.
For example your current patch does not move the timers and I do not think it makes sense to go and add notifier for the timers. I think the better approach is to use CPU hotplug here. ie Enforce the rule that cpu_system_map is updated only when CPU is off-line.
By bringing CPU down first we get a lot of features for free. All the kernel threads, timers, softirqs, HW irqs, workqueues, etc are properly terminated/moved/canceled/etc. This gives us a very clean state when we bring the CPU back online with "system" bit cleared (or "isolated" bit set like in my patches). I do not see a good reason for reimplementing that functionality via system_map notifiers.

I'm not convinced cpu hotplug notifiers are the right thing here. Sure
we could easily iterate the old and new system map and call the matching
cpu hotplug notifiers, but they seem overly complex to me.

The audit would be a good idea anyway. If we do indeed end up with a 1:1
mapping of whatever cpu hotplug does, then well, perhaps you're right.
I was not talking about calling notifiers directly. We can literally bring the CPU down. Just like echo 0 > /sys/devices/system/cpu/cpuN/online does.

I'll comment more on the individual patches.

Next on the list would be figuring out a nice solution to the workqueue
flush issue.
Do not forget the "stop machine", or more specifically module loading/unloading.

No, the full stop machine thing, there are more interesting users than
module loading. But I'm not too interested in solving this particular
problem atm, I have too much on my plate as it is.
I was not suggesting that it's up to you to solve it. You made it sounds like your patches provide complete solution, just need to fix the workqueues. I was merely pointing out that there is also stop machine that needs fixing.

btw You did not comment about the fact that your patch does not move timers.
I was trying it out last night. It's definitely not ready yet.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/