Re: [PATCH] x86: Document the hotplug code is incompatible with x86 irq handling

From: Eric W. Biederman
Date: Tue Jun 12 2007 - 14:22:19 EST

Pavel Machek <pavel@xxxxxx> writes:

> Hi!
>> I just realized that except for doing the code review and noticing
>> that the current cpu hotplug code is fundamentally incompatible
>> with x86 I haven't done anything about it. So here is my patch
>> to document what is wrong.
>> The current cpu hotplug code requires irqs to be migrated from a cpu
>> outside of irq context. On x86 ioapics simply do not support this,
>> making the code unfixable without major redesign of the generic cpu
>> hotplug code.
>> So this patch makes CPU_HOTPLUG on x86 depend on CONFIG_BROKEN
>> and adds a WARN_ON so people that do enable it are not in doubt about
>> which part of the code is broken, even if it does work for them.
>> --- a/arch/i386/kernel/irq.c
>> +++ b/arch/i386/kernel/irq.c
>> @@ -312,6 +312,19 @@ void fixup_irqs(cpumask_t map)
>> unsigned int irq;
>> static int warned;
>> + /*
>> + * Function is so wrong at so many levels.
>> + * - We migrate irqs that are directed at the cpu we are
>> + * removing.
> Is this about irq pinning?

Sorry. That should have been: We migrate irqs that are not directed at the
cpu we are removing. (We are migrating irqs when it is unnecessary).

>> + * - We cannot safely migrate ioapic irqs on x86 except in
>> + * side of irq context.
> 'inside'?
> Can you be more specific for this one?

Yes inside.

An irq migration currently requires two instances of the irq firing to
complete. Once on the source cpu once on the target cpu.

Migrating irqs while the irq is alive is a royal pain.

>> + * Since someone probably finds this useful just warn very
>> + * loudly until cpu hotplug is redesigned.
>> + */
>> + WARN_ON(1);
> Ugh, no, this does not warn anyone. This will just make people ask me
> why they see stack trace while suspending... and we are not interested
> in the stack trace, anyway.
> printk(KERN_WARNING)?

Because you are calling unfixably broken code. That should be a decent
incentive to do something else won't it?

IOAPICs do not support what the code is doing here. There is lots of
practical evidence including bad experiences and practical tests that
support this.

I suspect the only reason you don't have problems is that the irqs are
already shut down at the source before we get to this code path.

>> index 5ce9443..a61c4f2 100644
>> --- a/arch/x86_64/Kconfig
>> +++ b/arch/x86_64/Kconfig
>> @@ -429,7 +429,7 @@ config NR_CPUS
>> config HOTPLUG_CPU
>> bool "Support for suspend on SMP and hot-pluggable CPUs (EXPERIMENTAL)"
>> - depends on SMP && HOTPLUG && EXPERIMENTAL
>> + depends on SMP && HOTPLUG && EXPERIMENTAL && BROKEN
>> help
> Great, this will force everyone and their dog to enable broken, making
> broken useless. Please don't.

CONFIG_BROKEN is quite likely excessive but the code is totally and
unfixably broken. So it still doesn't feel wrong to me.

Perhaps we should just disable swap suspend on SMP until we get a
design that can be implemented correctly on existing hardware. I am
not happy with people telling me that we must keep broken code because
people with brand new SMP laptops will scream otherwise. Since the
fundamental problems in this code path don't appear to bite people
very frequently I don't mind waiting while an alternative solution is
debugged and tested, but there is no way it makes sense to keep this
code in service more for more than a kernel release or two.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at