Re: Unhandled IRQs on AMD E-450

From: Clemens Ladisch
Date: Sat Dec 10 2011 - 12:59:14 EST


Jeroen Van den Keybus wrote:
> [...]
> - CPU services the IRQ, and does at least one (slow) PCI read to have
> the device deassert its IRQ line. In practice, more PCI read/writes
> are needed, requiring the bridge to do some PCIe traffic generation.
> - Bridge sees the IRQ line trasition and signals Deassert, This
> message has only a few usecs to arrive at the I/O-APIC.
> - _However_ the CPU has by large already handled the IRQ and gets
> interrupted again before the Deassert ever gets out. The resulting PCI
> bus traffic further delays the Deassert message (due to e.g. PCIe
> transmit credit exhaustion).
>
> My idea is that if we would not immediately hammer the bridge with
> PCIe transactions, the Deassert message may eventually arrive ?

PCIe messages are somewhat ordered; posted memory writes are allowed,
but IIRC a read transaction serializes all previous and following
transactions. Assuming that all involved devices work correctly.

> Also, is there any control by Linux of the credits issued ?

I don't think these can be controlled by software. The hardware is
supposed to get them correct.

> I therefore patched the polling system by detecting a stuck IRQ
> already after 10 unserviced IRQs. Then the polling system will take
> over for 50 cycles (5 seconds), after which the IRQ is reenabled.
>
> [ 1607.941232] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 1613.040185] Reenabling IRQ.
> [ 1908.541558] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 1913.640088] Reenabling IRQ.
> [ 2319.361659] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 2324.460064] Reenabling IRQ.
> [ 2782.285470] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 2787.384222] Reenabling IRQ.
> [ 3485.689347] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 3490.788079] Reenabling IRQ.
> [ 3810.336883] irq 19: nobody cared (try booting with the "irqpoll" option)

So the IRQ _does_ get unstuck eventually; I didn't expact that.

So either the ASM1083 delays its Deassert messages, or it is just way
too slow to react to changes in its PCI interrupt line inputs.

I'd guess that you can make the pollig time shorter; a few milliseconds
should be enough.


Your patch might be useful to others afflicted with this chip. Could
you publish it?


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/