Re: [PATCH 2/4] workqueue: don't check wq->rescuer in rescuer

From: Lai Jiangshan
Date: Fri May 29 2020 - 10:59:00 EST


On Fri, May 29, 2020 at 10:14 PM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> On Fri, May 29, 2020 at 06:59:00AM +0000, Lai Jiangshan wrote:
> > Now rescuer checks pwq->nr_active before requeues the pwq,
> > it is a more robust check and the rescuer must be still valid.
> >
> > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> > ---
> > kernel/workqueue.c | 23 +++++++++--------------
> > 1 file changed, 9 insertions(+), 14 deletions(-)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index b2b15f1f0c8d..8d017727bfbc 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -248,7 +248,7 @@ struct workqueue_struct {
> > struct list_head flusher_overflow; /* WQ: flush overflow list */
> >
> > struct list_head maydays; /* MD: pwqs requesting rescue */
> > - struct worker *rescuer; /* MD: rescue worker */
> > + struct worker *rescuer; /* I: rescue worker */
> >
> > int nr_drainers; /* WQ: drain in progress */
> > int saved_max_active; /* WQ: saved pwq max_active */
> > @@ -2532,12 +2532,13 @@ static int rescuer_thread(void *__rescuer)
> > if (pwq->nr_active && need_to_create_worker(pool)) {
> > spin_lock(&wq_mayday_lock);
> > /*
> > - * Queue iff we aren't racing destruction
> > - * and somebody else hasn't queued it already.
> > + * Queue iff somebody else hasn't queued it
> > + * already.
> > */
> > - if (wq->rescuer && list_empty(&pwq->mayday_node)) {
> > + if (list_empty(&pwq->mayday_node)) {
> > get_pwq(pwq);
> > - list_add_tail(&pwq->mayday_node, &wq->maydays);
> > + list_add_tail(&pwq->mayday_node,
> > + &wq->maydays);
>
> send_mayday() also checks for wq->rescuer, so when sanity check fails,
> scenarios which would have leaked a workqueue after destroying its rescuer
> can lead to use-after-free after the patch. I'm not quite sure why the patch
> is an improvement.
>

Hi

I'm not sure I understood your words. And I'm not
sure which function may use freed object in "use-after-free".
Is it "send_mayday() may use a freed rescuer"?

This patch relies on
def98c84b6 ("workqueue: Fix spurious sanity check failures in
destroy_workqueue()")
to move the kthread_stop() before the sanity check and the work
of drain_workqueue() which guarantees there is no work item
in the workqueue. If send_mayday() still goes wrong after
drain_workqueue(), the user must have queued work items and
invoked destroy_workqueue() concurrently. It is excellent
if the sanity check can find this case out, but it is not possible
that the sanity check can always live through it since it is
not worqueue's internal fault. We hope the sanity check can
find all the internal fault, but not to the extend that
it can always work when any user uses it in a very wrong way.

Thanks
Lai.