Re: [PATCH 3/3] [BUGFIX] x86/x86_64: fix IRQ migration triggered active device IRQ interrruption

From: Yinghai Lu
Date: Sat May 23 2009 - 20:25:02 EST


On Wed, Apr 29, 2009 at 10:46 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Gary Hade <garyhade@xxxxxxxxxx> writes:
>
>>
>> So, I just rebuilt after _really_ applying the patch and got
>> the following result which probably to be what you intended.
>
> Ok.  Good to see.
>
>>> >> I propose detecting thpe cases that we know are safe to migrate in
>>> >> process context, aka logical deliver with less than 8 cpus aka "flat"
>>> >> routing mode and modifying the code so that those work in process
>>> >> context and simply deny cpu hotplug in all of the rest of the cases.
>>> >
>>> > Humm, are you suggesting that CPU offlining/onlining would not
>>> > be possible at all on systems with >8 logical CPUs (i.e. most
>>> > of our systems) or would this just force users to separately
>>> > migrate IRQ affinities away from a CPU (e.g. by shutting down
>>> > the irqbalance daemon and writing to /proc/irq/<irq>/smp_affinity)
>>> > before attempting to offline it?
>>>
>>> A separate migration, for those hard to handle irqs.
>>>
>>> The newest systems have iommus that irqs go through or are using MSIs
>>> for the important irqs, and as such can be migrated in process
>>> context.  So this is not a restriction for future systems.
>>
>> I understand your concerns but we need a solution for the
>> earlier systems that does NOT remove or cripple the existing
>> CPU hotplug functionality.  If you can come up with a way to
>> retain CPU hotplug function while doing all IRQ migration in
>> interrupt context I would certainly be willing to try to find
>> some time to help test and debug your changes on our systems.
>
> Well that is ultimately what I am looking towards.
>
> How do we move to a system that works by design, instead of
> one with design goals that are completely conflicting.
>
> Thinking about it, we should be able to preemptively migrate
> irqs in the hook I am using that denies cpu hotplug.
>
> If they don't migrate after a short while I expect we should
> still fail but that would relieve some of the pain, and certainly
> prevent a non-working system.
>
> There are little bits we can tweak like special casing irqs that
> no-one is using.
>
> My preference here is that I would rather deny cpu hotplug unplug than
> have the non-working system problems that you have seen.

and use delay work to offline cpu later after irq get moved to other cpu?

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/