Re: [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers()

From: Lai Jiangshan
Date: Tue Sep 04 2012 - 21:26:44 EST


On 09/05/2012 08:54 AM, Tejun Heo wrote:
> How about something like the following? This is more consistent with
> the existing code and as the fixes need to go separately through
> for-3.6-fixes, it's best to stay consistent regardless of the end
> result after all the restructuring. It's not tested yet. If you
> don't object, I'll split it into two patches, test and route them
> through for-3.6-fixes w/ your Original-patch-by.
>

I see that this patch's idea is same as mine but reuses
@idle_rebind.cnt and @idle_rebind.done.

I don't think it is consistent to avoid adding new field
and to reuse old field for different purpose


Thanks
Lai


> Thanks.
> ---
> kernel/workqueue.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 38 insertions(+), 13 deletions(-)
>
> Index: work/kernel/workqueue.c
> ===================================================================
> --- work.orig/kernel/workqueue.c
> +++ work/kernel/workqueue.c
> @@ -1326,6 +1326,15 @@ static void idle_worker_rebind(struct wo
>
> /* we did our part, wait for rebind_workers() to finish up */
> wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
> +
> + /*
> + * rebind_workers() shouldn't finish until all workers passed the
> + * above WORKER_REBIND wait. Tell it when done.
> + */
> + spin_lock_irq(&worker->pool->gcwq->lock);
> + if (!--worker->idle_rebind->cnt)
> + complete(&worker->idle_rebind->done);
> + spin_unlock_irq(&worker->pool->gcwq->lock);
> }
>
> /*
> @@ -1422,19 +1431,7 @@ retry:
> goto retry;
> }
>
> - /*
> - * All idle workers are rebound and waiting for %WORKER_REBIND to
> - * be cleared inside idle_worker_rebind(). Clear and release.
> - * Clearing %WORKER_REBIND from this foreign context is safe
> - * because these workers are still guaranteed to be idle.
> - */
> - for_each_worker_pool(pool, gcwq)
> - list_for_each_entry(worker, &pool->idle_list, entry)
> - worker->flags &= ~WORKER_REBIND;
> -
> - wake_up_all(&gcwq->rebind_hold);

don't need to move down.

> -
> - /* rebind busy workers */
> + /* all idle workers are rebound, rebind busy workers */
> for_each_busy_worker(worker, i, pos, gcwq) {
> struct work_struct *rebind_work = &worker->rebind_work;
> unsigned long worker_flags = worker->flags;
> @@ -1454,6 +1451,34 @@ retry:
> worker->scheduled.next,
> work_color_to_flags(WORK_NO_COLOR));
> }
> +
> + /*
> + * All idle workers are rebound and waiting for %WORKER_REBIND to
> + * be cleared inside idle_worker_rebind(). Clear and release.
> + * Clearing %WORKER_REBIND from this foreign context is safe
> + * because these workers are still guaranteed to be idle.
> + *
> + * We need to make sure all idle workers passed WORKER_REBIND wait
> + * in idle_worker_rebind() before returning; otherwise, workers can
> + * get stuck at the wait if hotplug cycle repeats.
> + */
> + idle_rebind.cnt = 1;
> + INIT_COMPLETION(idle_rebind.done);
> +
> + for_each_worker_pool(pool, gcwq) {
> + list_for_each_entry(worker, &pool->idle_list, entry) {
> + worker->flags &= ~WORKER_REBIND;
> + idle_rebind.cnt++;
> + }
> + }
> +
> + wake_up_all(&gcwq->rebind_hold);
> +
> + if (--idle_rebind.cnt) {
> + spin_unlock_irq(&gcwq->lock);
> + wait_for_completion(&idle_rebind.done);
> + spin_lock_irq(&gcwq->lock);
> + }
> }
>
> static struct worker *alloc_worker(void)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/