Re: [RFC/PATCH] cpuset: cpuset irq affinities

From: Peter Zijlstra
Date: Wed Mar 05 2008 - 03:38:46 EST



On Tue, 2008-03-04 at 19:11 -0600, Paul Jackson wrote:
> Max K wrote:
> > Yeah, that would definitely be awkward.
>
> Yeah - agreed - awkward.
>
> Forget that idea (allowing the same irq in multiple 'irqs' files.)
>
> It seems to me that we get into trouble trying to cram that 'system'
> cpuset into the cpuset hierarchy, where that system cpuset is there to
> hold a list of irqs, but is only partially a good fit for the existing
> cpuset hierarchy.
>
> Could this irq configuration be partly a system-wide configuration
> decision (which irqs are 'system' irqs), and partly a per-cpuset
> decision -- which cpusets (such as a real-time one) want to disable
> the usual system irqs that everyone else gets.
>
> The cpuset portion of this should take only a single per-cpuset Boolean
> flag -- which if set True (1), asks the system to "please leave my CPUs
> off the list of CPUs receiving the usual system irqs."
>
> Then the list of "usual system irqs" would be established in some /proc
> or /sys configuration. Such irqs would be able to go to any CPUs
> except those CPUs which found themselves in a cpuset with the above
> per-cpuset Boolean flag set True (1).

How about we make this in-kernel boot set, that by default contains all
IRQs, all unbounded kthreads and all of user-space.

To be compatible with your existing clients you only need to move all
the IRQs to the root domain.

(Upgrading a kernel would require distributing some new userspace
anyway, right? - and we could offer a .config option to disable the boot
set for those who do upgrade kernels without upgrading user-space).

Then, once you want to make use of the new features, you have to update
your batch scheduler to only make use of load_balance and not
cpus_exclusive (as they're only interested in sched_domains, right?)

So if you want to do IRQ isolation and batch scheduling on the same
machine (as is not possible now) you need to update userspace as said
before; so that it allows for the overlapping cpuset.

For example, on a 32 cpu machine:

/cgroup/boot 0-1 (kthreads - initial userspace)
/cgroup/irqs 0-27 (most irqs)
/cgroup/batch_A 2-5
/cgroup/batch_B 6-13
/cgroup/another_big_app 14-27
/cgroup/RT-domain 28-31 (my special irq)

So by providing a .config option for strict backward compatibility, a
simple way for runtime compatibility (moving all IRQs to the root) which
should be easy to do if the kernel upgrade is accompanied by a (limited)
user-space upgrade.

And once all the features need to be used together (something that is
now not possible - so new usage) then the code that relies on
cpus_exclusive to create sched_domains needs to be changed to use
load_balance instead.

Does that sound like a feasible plan?

> How does all this interact with /proc/irq/N/smp_affinity?

Much the same way the cpuset cpus_allowed interacts with a task's
cpus_allowed. That is, cs->cpus_allowed is a mask on top of the provided
affinity.

If for some reason the cs->cpus_allowed changes in such a way that the
user-specified mask becomes empty (irq->cpus_allowed & cs->cpus_allowed
== 0), then print a message and set it to the full mask
(irq->cpus_allowed = cs->cpus_allowed).

If for some reason the cs->cpus_allowed changes in such a way that the
mask is physically impossible (set_irq_affinity(cs->cpus_allowed)
fails), then print a message and move the IRQ to the parent set.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/