Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idleworker depletion during CPU_ONLINE

From: Tejun Heo
Date: Fri Sep 07 2012 - 15:29:36 EST


Hello,

On Fri, Sep 07, 2012 at 11:10:34AM +0800, Lai Jiangshan wrote:
> > This patch fixes the bug by releasing manager_mutexes before letting
> > the rebound idle workers go. This ensures that by the time idle
> > workers check whether management is necessary, CPU_ONLINE already has
> > released the positions.
>
> Could you review manage_workers_slowpath() in V4 patchset.
> It has enough changelog and comments.
>
> After the discussion,
>
> We don't move the hotplug code outside hotplug code. it matches this requirement.

Was that the one which deferred calling manager function to a work
item on trylock failure?

> Since we introduce manage_mutex(), any palace should be allowed to grab it
> when its context allows. So it is not hotplug code's responsibility of this bug.
>
> manage_workers() just use mutex_trylock() to grab the lock, it does not make
> hard to do it jobs when need, and it does not try to find out the reason of fail.
> so I think it is manage_workers()'s responsibility to handle this bug.
> a manage_workers_slowpath() is enough to fix the bug.

It doesn't really matter how the synchronization between regular
manager and hotplug path is done. The point is that hotplug path, as
much as possible, should be responsible for any incurred complexities,
so I'd really like to stay away from adding a completely different
path manager can be invoked in the usual path if at all possible.
Let's try to solve this from the hotplug side.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/