Re: [PATCH] x86: Document the hotplug code is incompatible with x86 irq handling

From: Rafael J. Wysocki
Date: Tue Jun 12 2007 - 16:45:58 EST


On Tuesday, 12 June 2007 20:19, Eric W. Biederman wrote:
> Pavel Machek <pavel@xxxxxx> writes:
>
> > Hi!
> >
> >> I just realized that except for doing the code review and noticing
> >> that the current cpu hotplug code is fundamentally incompatible
> >> with x86 I haven't done anything about it. So here is my patch
> >> to document what is wrong.
> >>
> >> The current cpu hotplug code requires irqs to be migrated from a cpu
> >> outside of irq context. On x86 ioapics simply do not support this,
> >> making the code unfixable without major redesign of the generic cpu
> >> hotplug code.
> >>
> >> So this patch makes CPU_HOTPLUG on x86 depend on CONFIG_BROKEN
> >> and adds a WARN_ON so people that do enable it are not in doubt about
> >> which part of the code is broken, even if it does work for them.
> >
> >
> >> --- a/arch/i386/kernel/irq.c
> >> +++ b/arch/i386/kernel/irq.c
> >> @@ -312,6 +312,19 @@ void fixup_irqs(cpumask_t map)
> >> unsigned int irq;
> >> static int warned;
> >>
> >> + /*
> >> + * Function is so wrong at so many levels.
> >> + * - We migrate irqs that are directed at the cpu we are
> >> + * removing.
> >
> > Is this about irq pinning?
>
> Sorry. That should have been: We migrate irqs that are not directed at the
> cpu we are removing. (We are migrating irqs when it is unnecessary).
>
> >> + * - We cannot safely migrate ioapic irqs on x86 except in
> >> + * side of irq context.
> >
> > 'inside'?
> >
> > Can you be more specific for this one?
>
> Yes inside.
>
> An irq migration currently requires two instances of the irq firing to
> complete. Once on the source cpu once on the target cpu.
>
> Migrating irqs while the irq is alive is a royal pain.
>
> >> + * Since someone probably finds this useful just warn very
> >> + * loudly until cpu hotplug is redesigned.
> >> + */
> >> + WARN_ON(1);
> >
> > Ugh, no, this does not warn anyone. This will just make people ask me
> > why they see stack trace while suspending... and we are not interested
> > in the stack trace, anyway.
> >
> > printk(KERN_WARNING)?
>
>
> Because you are calling unfixably broken code. That should be a decent
> incentive to do something else won't it?

Can you please tell me _what_ else can be done?

> IOAPICs do not support what the code is doing here. There is lots of
> practical evidence including bad experiences and practical tests that
> support this.

Well, AFAICS, Suresh has tried to debug one failing case recently without
any consistent conclusions. I don't know of any other test cases (links,
please?).

> I suspect the only reason you don't have problems is that the irqs are
> already shut down at the source before we get to this code path.

You are probably right, but OTOH I've tried the CPU hotplug via the sysfs
interface for _many_ times and it has _never_ failed for me.

> >> index 5ce9443..a61c4f2 100644
> >> --- a/arch/x86_64/Kconfig
> >> +++ b/arch/x86_64/Kconfig
> >> @@ -429,7 +429,7 @@ config NR_CPUS
> >>
> >> config HOTPLUG_CPU
> >> bool "Support for suspend on SMP and hot-pluggable CPUs (EXPERIMENTAL)"
> >> - depends on SMP && HOTPLUG && EXPERIMENTAL
> >> + depends on SMP && HOTPLUG && EXPERIMENTAL && BROKEN
> >> help
> >
> > Great, this will force everyone and their dog to enable broken, making
> > broken useless. Please don't.
>
> CONFIG_BROKEN is quite likely excessive but the code is totally and
> unfixably broken. So it still doesn't feel wrong to me.
>
> Perhaps we should just disable swap suspend on SMP until we get a
> design that can be implemented correctly on existing hardware.

All kinds of suspend, not only hibernation.

> I am not happy with people telling me that we must keep broken code because
> people with brand new SMP laptops will scream otherwise.

Sorry, but that's how it goes.

We've been using that code for more then a year now and I'd expect someone
to tell us that it's wrong a bit earlier.

If you are telling us to drop it now, then _please_ advise what we can use
instead, because we obviously need the functionality.

> Since the fundamental problems in this code path don't appear to bite people
> very frequently I don't mind waiting while an alternative solution is
> debugged and tested, but there is no way it makes sense to keep this
> code in service more for more than a kernel release or two.

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/