RE: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update

From: Pallipadi, Venkatesh
Date: Mon Jan 08 2007 - 13:37:52 EST




>-----Original Message-----
>From: linux-kernel-owner@xxxxxxxxxxxxxxx
>[mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of Oleg Nesterov
>Sent: Monday, January 08, 2007 9:07 AM
>To: Srivatsa Vaddagiri
>Cc: Andrew Morton; David Howells; Christoph Hellwig; Ingo
>Molnar; Linus Torvalds; linux-kernel@xxxxxxxxxxxxxxx; Gautham shenoy
>Subject: Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update
>
>On 01/08, Srivatsa Vaddagiri wrote:
>>
>> On Mon, Jan 08, 2007 at 06:56:38PM +0300, Oleg Nesterov wrote:
>> > > 2.
>> > >
>> > > CPU_DEAD->cleanup_workqueue_thread->(cwq->thread =
>NULL)->kthread_stop() ..
>> > > ^^^^^^^^^^^^^^^^^^^^
>> > > |___ Problematic
>> >
>> > Hmm... This should not be possible? cwq->thread != NULL on
>CPU_DEAD event.
>>
>> sure, cwq->thread != NULL at CPU_DEAD event. However
>> cleanup_workqueue_thread() will set it to NULL and block in
>> kthread_stop(), waiting for the kthread to finish run_workqueue and
>> exit.
>
>Ah, missed you point, thanks. Yet another old problem which
>was not introduced
>by recent changes. And yet another indication we should avoid
>kthread_stop()
>on CPU_DEAD event :) I believe this is easy to fix, but need
>to think more.

The current code is workqueue-hptplug path is full of races. I stumbled
upon atleast couple of different deadlock situations being discussed
here with ondemand governor using workqueue and trying to flush during
cpu hot remove.

Specifically, a three way deadlock involving kthread_stop() with
workqueue_mutex held and work itself blocked on some other mutex held by
another task trying to flush the workqueue.

One other approach I was thinking about, was to do all the hardwork in
workqueue CPU_DOWN_PREPARE callback rather than in CPU_DEAD.
We can call cleanup_workqueue_thread and take_over_work in DOWN_PREPARE,
With that, I don't think we need to hold the workqueue_mutex across
these two callbacks and eliminate the deadlocks related to
flush_workqueue.
Do you think this approach would simply things around here?

Thanks,
Venki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/