Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue

From: Oleg Nesterov
Date: Thu Feb 05 2009 - 13:03:49 EST


On 02/05, Frederic Weisbecker wrote:
>
> On Thu, Feb 05, 2009 at 06:01:56PM +0100, Oleg Nesterov wrote:
> > On 02/05, Lai Jiangshan wrote:
> > >
> > > DEADLOCK EXAMPLE for explain my above option:
> > >
> > > (work_func0() and work_func1() are work callback, and they
> > > calls flush_workqueue())
> > >
> > > CPU#0 CPU#1
> > > run_workqueue() run_workqueue()
> > > work_func0() work_func1()
> > > flush_workqueue() flush_workqueue()
> > > flush_cpu_workqueue(0) .
> > > flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0)
> > > waiting work_func1() in cpu#1 waiting work_func0 in cpu#0
> > >
> > > DEADLOCK!
> >
> > I am not sure. Note that when work_func0() calls run_workqueue(),
> > it will clear cwq->current_work, so another flush_ on CPU#1 will
> > not wait for work_func0, no?
>
> No but CPU#1 can wait for a completion that will never be done, because
> CWQ#0 is waiting for CWQ#1.

Still can't understand. When work_func0()->run_workqueue() returns,
we should have no works in ->worklist and ->current_work must be NULL.
If we have a barrier which was inserted before - it should be flushed.


But yes, deadlock is possible, if other works come after run_workqueue()
returns and before work_func1() starts the flush. Just the description is
not exactly accurate, imho.

And we have other problems. Just to say, nothing can guarantee that
run_workqueue() will ever return. It is correct if some work_struct
always re-queues itself and should be cancelled before destroy_workqueue().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/