[RFC][PATCH 0/2] boot interrupts on Intel X58 and 55x0

From: Stefan Assmann
Date: Fri Sep 04 2009 - 12:58:59 EST

This patchset is meant to disable boot interrupts on Intel X58 and 55x0
chipsets (Tylersburg). A lot of effort from Kei Tokunaga has gone into these
patches. Thanks a lot Kei!

The reason why this consists of 2 patches is that the PCI config space of the
configuration device to disable boot interrupts on these chipsets is not
always accessible by default. The first patch is to ensure that the device is
visible while the second patch applies the necessary changes to stop the
generation of boot interrupts. We're not really sure whether the final X58 and
55x0 chipsets have the configuration device visible or not, so patch #1 might
be superfluous but we've seen at least 2 machines where this is not the case.
That's one of the reasons why this patchset is marked as RFC. The other reason
is more serious namely the onboard NIC (8086:10c9) is malfunctioning on some
of our test system if the second patch is applied. It fails to acquire an IP
from DHCP and we're pretty clueless on this issue right now.
Help is greatly appreciated!

A quick summary of why boot interrupts are better off than on.

Boot interrupts will be generated by the chipset if the interrupt line of a
non-primary IO-APIC is masked and an IRQ arrives there. In that case a boot
interrupt will be forwarded to the PIC _and_ primary IO-APIC. We're not quite
sure why it arrives at the primary IO-APIC as well but it has been observed on
various chipsets.

As there will be no interrupt handler installed (for the boot interrupt) on
the primary IO-APIC the interrupt will be counted as spurious, which can
result in disabling the entire interrupt line by the kernel in case of too
many spurious interrupts. The problem only shows up if the primary IO-APIC
already has an interrupt handler installed on that line, otherwise that line
would be masked anyway and the boot interrupt silently ignored (which makes it
tricky to observe).

When does this become a problem?

Any device connected to a non-primary IO-APIC (that doesn't use MSIs) will
trigger the generation of boot interrupts if it's IO-APIC pin is masked. There
can be many reasons for that for example:
- The interrupt is shared and a buggy device driver (from another device)
causes the interrupt to get disabled by the kernel.
- The RT kernel masks interrupt lines during handling (threaded IRQ-handling).
- Kei reported from issues in the case of kdump when the first kernel disables
the IO-APICs before the second kernel starts booting.

It becomes a problem when too many interrupts are counted as spurious (the
boot interrupts cannot be handled because the kernel doesn't expect them) and
the kernel decides to better bring down the interrupt line.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/