Re: 2.4.19-rc3-ac2 SMP

From: James Cleverdon (jamesclv@us.ibm.com)
Date: Thu Jul 25 2002 - 15:48:26 EST


On Wednesday 24 July 2002 10:28 am, Mikael Pettersson wrote:
> On Wed, 24 Jul 2002 17:26:49 +0200 (SAST), Zwane Mwaikambo wrote:
> >i'd vote for that =) except for one thing.
> >
> >+ /* APICs tend to spasm when they get errors. Disable the error
> > intr. */ + apic_write_around(APIC_LVTERR, ERROR_APIC_VECTOR |
> > APIC_LVT_MASKED);
> >
> >Isn't that a bit drastic?
>
> Drastic is an understatement. Try "gross". Sane machines running correct
> code shouldn't throw local APIC errors. If something's causing errors,
> that something should be fixed, not hidden.
>
> I hope that was just a temporary debug hack and not part of the design...
>
> /Mikael

On the contrary, when Intel moved the local APIC from a separate chip onto the
CPU around the time of the P54C, they hobbled it. Formerly, it could accept
and latch any number of interrupts because it contained three bit vectors
that could store all the necessary state info. The P54C (and later) version
had two latches per interrupt level. The level was defined as the top nibble
of the interrupt vector. So, P54Cs could only latch two interrupts for, say,
the 0x31-0x3F range for ISA IRQs. Too bad if three 0x3X interrupts arrive.
Number 3 cannot be latched.

Intel added new error states to the local APIC and the bus protocol to allow
for interrupts to _not_ be delivered, thanks to the latch limit. On a busy
system with lots of interrupts, you will sometimes see several of these
receive accept errors per day. There is nothing you can do to fix the
condition, aside from processing all . It really is more of a warning than
an error.

On our NUMA P6 box, we found that the local APICs would occasionally start
spasming with error interrupts. An APIC bus analyzer didn't show any kind of
errors on the APIC bus. They would just weird out and all attempts to clear
the error had no effect. We never did find a solution to that one or get an
adequate explanation from Intel. The only kludge that worked was to turn off
the APIC error interrupt.

Naturally, the cleaned up version of the apic_state_dump patch wouldn't do
that, or would make it an option.

-- 
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 30 2002 - 14:00:21 EST