Re: [BUG 2.6.25-rc3] scheduler/hotplug: some processes aredealocked when cpu is set to offline

From: Yi Yang
Date: Tue Mar 04 2008 - 00:22:26 EST


On Mon, 2008-03-03 at 22:53 +0800, Yi Yang wrote:
> On Mon, 2008-03-03 at 13:02 +0100, Dmitry Adamushko wrote:
> > On 03/03/2008, Ingo Molnar <mingo@xxxxxxx> wrote:
> > >
> > > * Dmitry Adamushko <dmitry.adamushko@xxxxxxxxx> wrote:
> > >
> > > > per_cpu(watchdog_task, hotcpu) = NULL;
> > > > + mlseep(1);
> > >
> > >
> > > that wont build very well ...
> >
> > yeah, I forgot to mention that it's not even compile-tested :-/
> > I re-created it from scratch instead of looking for the original one.
> >
> > please, this one (again, not compile-tested)
> >
> > --- softlockup-prev-2.c 2008-03-03 12:38:36.000000000 +0100
> > +++ softlockup.c 2008-03-03 13:00:20.000000000 +0100
> > @@ -294,6 +294,7 @@ cpu_callback(struct notifier_block *nfb,
> > case CPU_DEAD_FROZEN:
> > p = per_cpu(watchdog_task, hotcpu);
> > per_cpu(watchdog_task, hotcpu) = NULL;
> > + msleep(1);
> > kthread_stop(p);
> > break;
> > #endif /* CONFIG_HOTPLUG_CPU */
>
> I don't think it can fix this issue, it only gives one chance to
> scheduler, i think there are another potential and very serious issues
> inside of scheduler or locking or what else we don't know.
>
> Maybe migration is a doubtful point as Gautham mentioned.

That issue is still there after the above patch is applied.

I found that [watchdog/#] is indeed migrated to other cpu because
migration_call is called before cpu_callback, i think this is the real
root cause very very possibly.

I suggest we can develop a new notifier infrastructure in which one
caller can specify whether it is kthread_stopping a cpu-bind kthread
so that such notifier callbacks can be executed prior to other
callbacks.
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/