[PATCH 2/2 V7 for-3.6-fixes] workqueue: fix idle worker depletion

From: Lai Jiangshan
Date: Sun Sep 09 2012 - 22:09:37 EST


If hotplug code grabbed the manager_mutex and worker_thread try to create
a worker, the manage_worker() will return false and worker_thread go to
process work items. Now, on the CPU, all workers are processing work items,
no idle_worker left/ready for managing. It breaks the concept of workqueue
and it is bug.

So when manage_worker() failed to grab the manager_mutex, it should
release gcwq->lock and then grab manager_mutex.

After gcwq->lock is released, hotplug can happen. but the hoplug code
can't unbind/rebind the manager, so the manager should try to rebind
itself unconditionaly, if it fails, unbind itself.

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
---
kernel/workqueue.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 383548e..74434c8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1825,10 +1825,39 @@ static bool manage_workers(struct worker *worker)
struct worker_pool *pool = worker->pool;
bool ret = false;

- if (!mutex_trylock(&pool->manager_mutex))
+ if (pool->flags & POOL_MANAGING_WORKERS)
return ret;

pool->flags |= POOL_MANAGING_WORKERS;
+
+ if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+ /*
+ * Ouch! rebind_workers() or gcwq_unbind_fn() beats it.
+ * it can't return false here, otherwise it will lead to
+ * worker depletion. So we release gcwq->lock and then
+ * grab manager_mutex again.
+ */
+ spin_unlock_irq(&pool->gcwq->lock);
+ mutex_lock(&pool->manager_mutex);
+
+ /*
+ * The hotplug had happened after the previous releasing
+ * of gcwq->lock. So we can't assume that this worker is
+ * still associated or not. And we have to try to rebind it
+ * via worker_maybe_bind_and_lock(). If it returns false,
+ * we can conclude that the whole gcwq is disassociated,
+ * and we must unbind this worker. (hotplug code can't
+ * unbind/rebind the manager, because hotplug code can't
+ * iterate the manager)
+ */
+ if (worker_maybe_bind_and_lock(worker))
+ worker->flags &= ~WORKER_UNBOUND;
+ else
+ worker->flags |= WORKER_UNBOUND;
+
+ ret = true;
+ }
+
pool->flags &= ~POOL_MANAGE_WORKERS;

/*
--
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/