Re: [tip:x86/apic] x86: Use EOI register in io-apic on intelplatforms

From: Suresh Siddha
Date: Sat Nov 07 2009 - 02:29:34 EST

On Thu, 2009-11-05 at 22:53 -0800, Maciej W. Rozycki wrote:
> What I mean is if the serial delivery type is used, then an interrupt
> will be acked twice -- once via an EOI message sent from the local APIC
> over the serial bus and then again via the write to the EOI register.

Maciej, Case where we are doing an explicit EOI to the io-apic (using
EOI register) is when the level triggered interrupt gets registered at
the cpu as an edge interrupt (in the local apic's trigger mode

It will arrive as an edge interrupt for two cases.
a) for corner conditions which hit the 82093AA (io-apic version 0x11)
b) with my recent patch in -tip, during a cpu offline, when we send an
ipi (IPI is always registered as an edge triggered at the cpu) to
service the interrupt at the new cpu destination, instead of servicing
at it's old destination cpu (as it has already disabled interrupts and
going down -- like not being on the cpu_online_map etc).

So we are not acking the io-apic twice in this case, as the eoi to the
local apic won't brodcast the eoi to the io-apic (because of the edge
mode in trigger mode register of the local apic).

> There is a race condition here, so if the IRQ line is still/again
> asserted, then most likely two consecutive IRQ messages will be sent by
> the I/O APIC and they may be accepted by two different local units and
> eventually delivered to the OS. Linux will cope, but still this is sloppy
> programming, so it should be best avoided -- if possible that is.

I hope the above clarified.

> >
> > I am not sure if I follow. With the recent changes (tip commit
> > 5231a68614b94f60e8f6c56bc6e3d75955b9e75e), we use ipi on a new cpu to
> > handle a pending level-triggered interrupt on the cpu that is going
> > offline. It's just not only in the case of io-apic erratum for 0x11, we
> > see level triggered interrupt as edge interrupt at the cpu.
> I don't get it, sorry -- an interrupt has its trigger mode implied by the
> IRQ_LEVEL status flag being set or clear in the IRQ's descriptor. What's
> set in the local APIC's TMR does not (or should not) matter IMO.

Same, did the above clarify?

> Has the trigger mode mismatch erratum been seen for FSB delivered
> interrupts anyway?


> Complete the EOI dance for the trigger mode mismatch APIC erratum before
> proceeding to IRQ migration.

I am ok with what you are tying to fix, but not your patch. please see

> + if (use_eoi_reg) {
> + ack_APIC_irq();
> + eoi_ioapic_irq(desc);

We shouldn't do this unconditionally. i.e., we should do this double ack
only when the level-triggered interrupt is seen as an edge triggered
interrupt at the cpu (specified by the trigger mode register in local

Otherwise as you mentioned above, we will see two EOI msg's at the
io-apic (one by the cpu's eoi broadcast and another by explicit eoi to
the io-apic).

Best way to fix the issue you noticed is by simply moving the tail end
of that erratum fix before we try to migrate the irq to a new place.

Do you agree?

From: "Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx>
Subject: x86, io-apic: move the effort of clearing remoteIRR explicitly
before migrating the irq

When the level-triggered interrupt is seen as an edge interrupt, we try
to clear the remoteIRR explicitly (using either an io-apic eoi register
when present or through the idea of changing trigger mode of the io-apic
RTE to edge and then back to level). But this explicit try also needs to
happen before we try to migrate the irq. Otherwise irq migration attempt
will fail anyhow, as it postpones the irq migration to a later attempt
when it sees the remoteIRR in the io-apic RTE still set.

Signed-off-by: "Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx>
Reviewed-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 47d95c3..9a26ea1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2583,6 +2583,20 @@ static void ack_apic_level(unsigned int irq)

+ /* Tail end of version 0x11 I/O APIC bug workaround */
+ if (!(v & (1 << (i & 0x1f)))) {
+ atomic_inc(&irq_mis_count);
+ if (use_eoi_reg)
+ eoi_ioapic_irq(desc);
+ else {
+ spin_lock(&ioapic_lock);
+ __mask_and_edge_IO_APIC_irq(cfg);
+ __unmask_and_level_IO_APIC_irq(cfg);
+ spin_unlock(&ioapic_lock);
+ }
+ }
/* Now we can move and renable the irq */
if (unlikely(do_unmask_irq)) {
/* Only migrate the irq if the ack has been received.
@@ -2616,20 +2630,6 @@ static void ack_apic_level(unsigned int irq)
- /* Tail end of version 0x11 I/O APIC bug workaround */
- if (!(v & (1 << (i & 0x1f)))) {
- atomic_inc(&irq_mis_count);
- if (use_eoi_reg)
- eoi_ioapic_irq(desc);
- else {
- spin_lock(&ioapic_lock);
- __mask_and_edge_IO_APIC_irq(cfg);
- __unmask_and_level_IO_APIC_irq(cfg);
- spin_unlock(&ioapic_lock);
- }
- }


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at