Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "active" device IRQ interrruption

From: Eric W. Biederman
Date: Wed Jun 03 2009 - 17:13:41 EST


Gary Hade <garyhade@xxxxxxxxxx> writes:

> Correct, after the fix was applied my testing did _not_ show
> the lockups that you are referring to. I wonder if there is a
> chance that the root cause of those old failures and the root
> cause of issue that my fix addresses are the same?
>
> Can you provide the test case that demonstrated the old failure
> cases so I can try it on our systems? Also, do you recall what
> mainline version demonstrated the old failure

The irq migration has already been moved to interrupt context by the
time I started working on it. And I managed to verify that there were
indeed problems with moving it out of interrupt context before my code
merged.

So if you want to reproduce it reduce your irq migration to the essentials.
Set IRQ_MOVE_PCNTXT, and always migrate the irqs from process context
immediately.

Then migrate an irq that fires at a high rate rapidly from one cpu to
another.

Right now you are insulated from most of the failures because you still
don't have IRQ_MOVE_PCNTXT. So you are only really testing your new code
in the cpu hotunplug path.

Now that I look at it in more detail you are doing a double
mask_IO_APIC_irq and unmask_IO_APIC_irq on the fast path and
duplicating the pending irq check. All of which are pretty atrocious
in and of themselves.

Eric



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/