Re: RT interrupt handling

From: Kyle Moffett
Date: Fri Apr 28 2006 - 19:19:48 EST


On Apr 28, 2006, at 17:08:59, Darren Hart wrote:
I ran into a situation where binding a realtime testsuite to cpu 0 (on a 4 way opteron machine) locked the machine hard while binding it to cpu 2 worked fine. Some investigation suggests that the interrupt handlers for eth0 and ioc0 (IRQ 24 and 26) had the smp_affinity mask set to only cpu 0. With the test case running threads with rt prios in the 90s and the irqs running in the ~40s (don't recall, somewhere around there I think), it isn't surprising that the machine locked up.

I'd like to hear people's thoughts on the following:

o Why would those irqs be bound to just cpu 0? Why not all cpus?

Are you running an irq balancing daemon of some sort? (Or kernel IRQ balancer?) I believe those alter the CPU affinity for various interrupt threads to optimize IRQ efficiency.


o Is it reasonable to extend the smp_affinity for all interrupts to all cpus to minimize this type of problem?

Probably so, although I would bet that it is already (unless I misunderstand the situation).


o Should a userspace RT task be able to take down the system? Do we roll with the spiderman addage "With great power comes great responsibility" when discussing RT systems, or should we consider some kind of priority boosting mechanism for kernel services that must be run every so often to keep the system running?

The general consensus is that Linux RT code strives to be as hard-RT as possible, which means if you prioritize your code over the networking interrupt, you expect to get runtime even when the network card has work to do. If you don't want it that way, don't set the priorities that way :-D.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/