RE: [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs

From: Martin J. Bligh (mbligh@aracnet.com)
Date: Mon Dec 23 2002 - 02:52:14 EST


> Martin, Couple of days back I have posted a kernel IRQ distribution patch with some discussion. There we tried doing same things as you have interests here. We have made the interval flexible and longer. Also the randomness of the algorithm is removed.

Yup, saw it, but haven't given it the inspection it really deserves yet.
That code does need some work, and it sounds like you're doing the
right things to it.

> Also about the fairness. Scheduler will not be able to solve the fairness issues coming because of the interrupts at all the times. For example, at very interrupts load, some of the CPUs may get 100% busy just servicing the interrupts. Here the scheduler cannot do anything. To get the fairness, we need the interrupts distribution mechanism to move interrupts as required.

Well, if the scheduler didn't ding the process for time spent in interrupts,
I think it'd work out - it could always run processes on another CPU ;-)
But that may not be practical to do in reality.

> May be we can add some generic NUMA awareness in it. But I am not fully aware of the way interrupt routing happens in various NUMA systems. If I can get this information I can look into, how can we have the generic NUMA support in the new IRQ distribution code.

Mmm... well it's late and I'm tired, but off the top of my head ... you
need to map from each PCI bus to the closest set of cpus - for me that's
a simple bus_to_node mapping (not sure that bit is added to the topology
infrastructure yet, but it's a trivial patch that's floating around ...
I'll try to dig out out and add it to the 2.5-mjb tree). Then just limit
the distrubtion for an interrupt to the closest set of CPUs (for UMA SMP
would just be cpu_online_map), and have another abstracted function that
sets IO-APIC distribution up to a certain CPU (if doing balancing explicity)
or group thereof. But it's late, so if that makes no sense, I'll take it
all back in the morning ;-)

If you're interested in working on it, I'm very happy to test it ...
(should probably be kept seperate from your other stuff though).
I'll see if I can find someone in our performance team to evaluate
how your existing patch runs on SMP for us ...

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Dec 23 2002 - 22:00:31 EST