Re: [PATCH] workqueue: flush all pending jobs in destroy_workqueue()

From: Philipp Stanner
Date: Fri Apr 25 2025 - 05:57:38 EST


On Fri, 2025-04-25 at 09:33 +0000, Alice Ryhl wrote:
> On Thu, Apr 24, 2025 at 09:57:55AM -1000, Tejun Heo wrote:
> > Hello, Alice.
> >
> > On Wed, Apr 23, 2025 at 05:51:27PM +0000, Alice Ryhl wrote:
> > ...
> > > @@ -367,6 +367,8 @@ struct workqueue_struct {
> > >   struct lockdep_map __lockdep_map;
> > >   struct lockdep_map *lockdep_map;
> > >  #endif
> > > + raw_spinlock_t delayed_lock; /* protects
> > > pending_list */
> > > + struct list_head delayed_list; /* list of
> > > pending delayed jobs */
> >
> > I think we'll have to make this per-CPU or per-pwq. There can be a
> > lot of
> > delayed work items being queued on, e.g., system_wq. Imagine that
> > happening
> > on a multi-socket NUMA system. That cacheline is going to be
> > bounced around
> > pretty hard.
>
> Hmm. I think we would need to add a new field to delayed_work to keep
> track of which list it has been added to.
>
> Another option could be to add a boolean that disables the list.
> After
> all, we never call destroy_workqueue() on system_wq so we don't need
> the
> list for that workqueue.
>
> Thoughts?

I for my part was astonished that I actually found this half-bug in the
WQ implementation, because WQs are a) very important and b) very
intensively used, so I had expected that the bug *must* be on my side.
The fact that it wasn't is a hint for me that there are not that many
parties in the kernel that tear down with non-canceled DW.

You also have to race a bit to run into the problem.

I'm not sure how relevant that is for the synchronization overhead
Tejun describes; but take it for what it's worth.


P.

>
> Alice