Re: NMI problems with Dell SMP Xeons

From: Rajesh Shah
Date: Wed Jun 07 2006 - 20:49:17 EST


On Wed, Jun 07, 2006 at 03:18:57PM +0000, Brendan Trotter wrote:
> Hi,
>
> On 6/7/06, Keith Owens <kaos@xxxxxxx> wrote:
> > This problem is not KDB specific, although that is where it was first
> > noticed. Any code that sends a broadcast IPI 2 or an NMI IPI will
> > crash these Dell boxes when there is a mismatch between the cpus known
> > to the BIOS and the cpus known to the OS.
>
> I missed this (broadcast IPI 2 causing problems)...
>
> Before the OS starts all AP CPUs should be in a special halt state
> (specifically "cli; hlt"), where the only interrupt sources that reach
> the CPU are NMI, SMI and INIT. This is why broadcast NMI is unsafe for
> a variety of reasons (not just on Dells).
>
> If the Dell machines allow an IPI 2 to get through then unused CPUs
> aren't in the "cli; hlt" state, and they would also respond to any
> other broadcast IPIs. This would imply that any code that uses
> send_IPI_allbutself(), send_IPI_all() or anything else that sends
> broadcast IPIs would/could cause problems on these Dell machines.

The other possibility is that these CPUs are in the cli;hlt state,
but BIOS didn't install a proper NMI handler for them. This would
explain why other "normal" broadcast IPIs didn't cause problems
while NMI broadcast killed the machine.

Rajesh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/