Re: [RFC/PATCH] cpuset: cpuset irq affinities

From: Peter Zijlstra
Date: Fri Feb 29 2008 - 16:15:13 EST



On Fri, 2008-02-29 at 14:55 -0600, Paul Jackson wrote:
> Like Ingo, I like the approach.
>
> But I am concerned it won't work, as stated.
>
> Unfortunately, my blithering ignorance of how one might want to
> distribute irq's across a system is making it difficult for me
> to say for sure if this works or not.
>
> The thing about /dev/cpuset that I am afraid will get in the way
> with this use of cpusets to place irqs is that we can really only
> have a single purpose hierarchy below /dev/cpuset.
>
> For example, lets say we have:
>
> /dev/cpuset
> boot
> big_special_app
> a_few_isolated_rt_nodes
> batchscheduler
> batch job 1
> batch job 2
> ...

I might just be new-fangled, but I have a /cgroup mount.
but I guess that's just different mount-point of cgroup, right?

> I guess, with your "cpuset: cpuset irq affinities" patch, we'd start
> off with /dev/cpuset/irqs listing the irqs available, and we could
> reasonably decide to move any or all irqs to /dev/cpuset/boot/irqs,
> by writing the numbers of those irqs to that file, one irq number
> per write(2) system call (as is the cpuset convention.)

Right.

> Do these irqs have any special hardware affinity? Or are they
> just consumers of CPU cycles that can be jammed onto whatever CPU(s)
> we're willing to let be interrupted?

Depends a bit, the genirq layer seems to allow for irqs that can't be
freely placed. But most of them can be given a free mask - /me looks @
tglx/ingo.

> If for reason of desired hardware affinity, or perhaps for some other
> reason that I'm not aware of, we wanted to have the combined CPUs in
> both the 'boot' and 'big_special_app' handle some irq, then we'd be
> screwed. We can't easily define, using the cpuset interface and its
> conventions, a distinct cpuset overlapping boot and big_special_app,
> to hold that irq. Any such combining cpuset would have to be the
> common parent of both the combined cpusets, an annoying intrusion on
> the expected hierarchy.
>
> If the actual set of CPUs we wanted to handle a particular irq wasn't
> even the union of any pre-existing set of cpusets, then we'd be even
> more screwed, unable even to force the issue by imposing additional
> intermediate combined cpusets to meet the need.

I see the issue. We don't support mv on cgroups, right? To easily create
common parents...

> If there is any potential for this to be a problem, then we should
> examine the possibility of making irqs their own cgroup, rather than
> piggy backing them on cpusets (which are now just one instance of a
> cgroup module.)

Hmm, but that would then be another controller based on cpus. Might be a
tad confusing. Might be needed. I'll ponder..

> Could you educate me a little, Peter, on what these irqs are and on
> the sorts of ways people might want to place them across CPUs?

I'm not sure I know what you're asking. IRQ are hardware notifiers and
do all kinds of things depending on the hardware. Network cards
typically use them to notify the CPU of incoming packets. Video cards
can do vsync notifiers, empty dma buffers, whatnot.

> > + if (s->len > 0)
> > + s->len += scnprintf(s->buf + s->len, s->buflen - s->len, " ");
>
> The other 'vector' type cpuset file, "tasks", uses a newline '\n'
> field terminator, not a space ' ' separator. Would '\n' work here,
> or is ' ' just too much the expected irq separator in such ascii lists?
> My preference is toward using the exact same vector syntax in each
> place, so that once someone has code that handles one, they can
> repurpose that code for another with minimum breakage.

I'm fine with whatever, I saw a ',' in the bitmap stuff, not really sure
how that ended up being a ' ' in the patch I send out... :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/