Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus()take 3

From: Gautham R Shenoy
Date: Mon Jun 15 2009 - 00:04:31 EST


On Thu, Jun 11, 2009 at 11:50:15AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 11, 2009 at 04:41:42PM +0800, Lai Jiangshan wrote:
> > Andrew Morton wrote:
> > >
> > > I still think we should really avoid having to do this. trylocks are
> > > nasty things.
> > >
> > > Looking at the above, one would think that a correct fix would be to fix
> > > the bug in "thread 2": take the locks in the correct order? As
> > > try_get_online_cpus() doesn't actually have any callers, it's hard to
> > > take that thought any further.
> >
> > Sometimes, we can not reorder the locks' order.
> > try_get_online_cpus() is really needless when no one uses it.
> >
> > Paul's expedited RCU V7 may need it:
> > http://lkml.org/lkml/2009/5/22/332
> >
> > So this patch can be omitted when Paul does not use it.
> > It's totally OK for me.
>
> Although my patch does not need it in and of itself, if someone were
> to hold a kernel mutex across synchronize_sched_expedited(), and also
> acquire that same kernel mutex in a hotplug notifier, the deadlock that
> Lai calls out would occur.
>
> Even if no one uses synchronize_sched_expedited() in this manner, I feel
> that it is good to explore the possibility of dealing with it. As
> Andrew Morton pointed out, CPU-hotplug locking is touchy, so on-the-fly
> fixes are to be avoided if possible.

Agreed. Though I like the atomic refcount version of
get_online_cpus()/put_online_cpus() that Lai has proposed.

Anyways, to quote the need for try_get_online_cpus() when it was
proposed last year, it was to be used in worker thread context.

Because in those times we could not do a get_online_cpus() from
the worker thread context fearing the follwing deadlock during
a cpu-hotplug.

Thread 1:(cpu_offline) | Thread 2 ( worker_thread)
-----------------------------------------------------------------------
cpu_hotplug_begin(); |
. |
. | get_online_cpus(); /*Blocks */
. |
. |
CPU_DEAD: |
workqueue_cpu_callback(); |
cleanup_workqueue_thread() |
/* Waits for worker thread
* to finish.
* Hence a deadlock.
*/

This was fixed by introducing the CPU_POST_DEAD event, the notification

>
> Thanx, Paul

--
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/