Re: [PATCH v7 4/4] workqueue: Unbind kworkers before sending them to exit()

From: Valentin Schneider
Date: Wed Jan 11 2023 - 07:50:46 EST


On 10/01/23 10:28, Tejun Heo wrote:
> Hello,
>
> The series generally looks good to me. Just one thing.
>
> On Mon, Jan 09, 2023 at 01:33:16PM +0000, Valentin Schneider wrote:
>> @@ -3658,13 +3702,24 @@ static void put_unbound_pool(struct worker_pool *pool)
>> TASK_UNINTERRUPTIBLE);
>> pool->flags |= POOL_MANAGER_ACTIVE;
>>
>> + /*
>> + * We need to hold wq_pool_attach_mutex() while destroying the workers,
>> + * but we can't grab it in rcuwait_wait_event() as it can clobber
>> + * current's task state. We can drop pool->lock here as we've set
>> + * POOL_MANAGER_ACTIVE, no one else can steal our manager position.
>> + */
>> + raw_spin_unlock_irq(&pool->lock);
>> + mutex_lock(&wq_pool_attach_mutex);
>> + raw_spin_lock_irq(&pool->lock);
>
> The original pattern was a bit weird to begin with and this makes it quite
> worse.

That it does!

> Let's do something more straight forward like:
>
> while (true) {
> rcuwait_wait_event(&manager_wait,
> !(pool->flags & POOL_MANAGER_ACTIVE),
> TASK_UNINTERRUPTIBLE);
> mutex_lock(&wq_pool_attach_mutex);
> raw_spin_lock_irq(&pool->lock);
> if (!(pool->flags & POOL_MANAGER_ACTIVE)) {
> pool->flags |= POOL_MANAGER_ACTIVE;
> break;
> }
> raw_spin_unlock_irq(&pool->lock);
> mutex_unlock(&wq_pool_attach_mutex);
> }
>

That should do the trick, I'll go test it out.


While we're here, for my own education I was trying to figure out in what
scenarios we can hit this manager-already-active condition. When sending
out v6 I had convinced myself it could happen during failed
initialization of a new unbound pool, but having another look at it now I'm
not so sure anymore.

The only scenario I can think of now is around maybe_create_worker()'s
release of pool->lock, as that implies another worker can drain the
pool->worklist and thus let pool->refcnt reach 0 while another worker is
being the pool manager. Am I looking at the right thing?

Thanks