[CFT] xAPIC patches break Dell 420

From: James Cleverdon (jamesclv@us.ibm.com)
Date: Wed Dec 05 2001 - 23:44:30 EST


Hi,

The following patches are meant for the forthcoming Summit chipset for a new
IBM NUMA box. The problem is that bcrl at Red Hat found that they cause a
Dell 420 box to take an infinite number of APIC error interrupts with a value
of 8: Receive Accept Error. This happens around the time that network
devices are being probed. We've tried the patches on all the hardware handy
and can't make it break. Could someone out there give it a try on different
SMP boxes? Thanks!

Background:

These patches are to support up to 16 CPUs in a Foster/xAPIC NUMA box. The
xAPICs are used in parallel mode (i.e. they send the interrupt message down a
system bus), much like the SAPICs for IA64. As such, they share a problem
with SAPICs: only one CPU per cluster (usually 4 CPUs per cluster) will be
hit by all the interrupts. This is because Linux doesn't change the TPR or
XTPR registers after zeroing them at boot. So, all interrupts go to
whichever CPU is picked by the host bridge's tie-breaker logic. I've got a
simple round robin function in the patch to help distribute the load a bit
better, but it could certainly use some improvements, later. Right now it
works OK.

I've hijacked Martin Bligh's CONFIG_MULTIQUAD code to do this and broken it
for the target hardware. So, the dozen or so folks out there who run Linux
instead of Dynix/ptx on their NUMA-Q boxes needn't try these patches yet. 8^)

Thanks again!

-- 
James Cleverdon, IBM xSeries Platform (NUMA), Beaverton
jamesclv@us.ibm.com   |   cleverdj@us.ibm.com


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Dec 07 2001 - 21:00:32 EST