Re: smp_call_function_single lockups

From: Linus Torvalds
Date: Tue Mar 31 2015 - 11:12:55 EST


On Mon, Mar 30, 2015 at 8:15 PM, Chris J Arges
<chris.j.arges@xxxxxxxxxxxxx> wrote:
>
> I modified the posted patch with the following:

Actually, in addition to Ingo's patches (and the irq printout), which
you should try first, if none of that really gives any different
behavior, can modify that ack_APIC_irq() debugging code a bit more:

> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index bf32309..dc3e192 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -441,7 +441,7 @@ static inline void ack_APIC_irq(int vector)
> if (vector >= 16) {
> unsigned v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
> v >>= vector & 0x1f;
> - WARN_ON_ONCE(!(v & 1));
> + WARN(!(v & 1), "ack_APIC_irq: vector = %0x\n", vector);
> }
> /*
> * ack_APIC_irq() actually gets compiled as a single instruction

So what I'd suggest doing is:

- change the test of "vector >= 16" to just "vector >= 0".

We still have "-1" as the "unknown vector" thing, but I think only
the ack_bad_irq() thing calls it, and that should print out its own
message if it ever triggers, so it isn't an issue.

The reason for the ">= 16" was kind of bogus - the first 16 vectors
are system vectors, but we definitely shouldn't ack the apic for such
vectors anyway, so giving a warning for them is very much appropriate.
In particular, vector 2 is NMI, and maybe we do ACk it incorrectly.

- add a "return" if the warning triggers, and simply don't do the
actual ACK cycle at all if the ISR bit is clear.

IOW, make it do "if (WARN(..)) return;"

Now, we might get the vector number wrong for some reason, and in that
case not ACK'ing at all might cause problems too, but it would be
interesting to see if it changes behavior wrt the lockup.

I don't have any other ideas at the moment, but hopefully the
suggested changes by me and Ingo will give some more data to go on and
clarify what might be going on.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/