Re: [PATCH] cpu hotplug fixes for dependent_sleeper andwake_sleeping_dependent

From: Rusty Russell
Date: Wed Sep 01 2004 - 18:55:30 EST


On Mon, 2004-08-30 at 23:29, Ingo Molnar wrote:
> * Nathan Lynch <nathanl@xxxxxxxxxxxxxx> wrote:
>
> > To recap, offlining a cpu with current bk results in the "Aiee,
> > killing interrupt handler!" panic from do_exit(). This seems to be
> > triggered only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled.
> > I believe the problem is that when do_stop() calls schedule(),
> > dependent_sleeper() drops the "offline" cpu's rq->lock and never
> > reacquires it.
> >
> > The following seems to work (tested on ppc64). Is there a better way?
>
> > - if (!(sd->flags & SD_SHARE_CPUPOWER))
> > + if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
>
> > + if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
>
> it would really be nice to do this without any runtime overhead. Rusty?

If the scheduling topology is updated atomically (ie. inside
__cpu_disable), this would not happen; there are patches around to do
this and I think longer term this is the correct fix. However, I
believe a good current fix is to merely ensure that the current cpu is
always set in dependent_sleeper(), something like:L

/*
* The same locking rules and details apply as for
* wake_sleeping_dependent():
*/
spin_unlock(&this_rq->lock);
cpus_and(sibling_map, sd->span, cpu_online_map);
+ /* Include this CPU, for special case of going us dying */
+ cpu_set(this_cpu, sibling_map);
for_each_cpu_mask(i, sibling_map)
spin_lock(&cpu_rq(i)->lock);
cpu_clear(this_cpu, sibling_map);

Not quite free, but very cheap...
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/