Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update

From: Srivatsa Vaddagiri
Date: Mon Jan 08 2007 - 23:39:52 EST


On Mon, Jan 08, 2007 at 08:06:35PM +0300, Oleg Nesterov wrote:
> Ah, missed you point, thanks. Yet another old problem which was not introduced
> by recent changes. And yet another indication we should avoid kthread_stop()
> on CPU_DEAD event :) I believe this is easy to fix, but need to think more.

I think the problem is not just with CPU_DEAD. Anyone who calls
cleanup_workqueue_thread (say destroy_workqueue?) will see this race.

Do you see any problems if cleanup_workqueue_thread is changed as:

cleanup_workqueue_thread()
{
kthread_stop(p);
spin_lock(cwq->lock);
cwq->thread = NULL;
spin_unlock(cwq->lock);
}


> run_workqueue:
>
> while (!list_empty(&cwq->worklist)) {
> ...
> // We hold lock_cpu_hotplug(), cpu event can't make
> // progress.
> ...
> }

Ok ..yes a cpu_event_waits_for_lock() helper will help here.

> > I agree it minimizes the interactions. Maybe worth attempting. However I
> > suspect it may not be as simple as it appears :)
>
> Yes, that is why this patch only does the first step: flush_workqueue() checks
> the dead CPUs as well, this change is minimal.
>
> Do you see any problems this patch adds?

I dont see as of now. I suspect we will know better when we implement
the patch to eliminate CPU_DEAD handling in workqueue.c

--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/