Re: [PATCH] workqueue: skip nr_running sanity check inworker_enter_idle() if trustee is active

From: Paul E. McKenney
Date: Wed May 16 2012 - 20:16:09 EST


On Mon, May 14, 2012 at 03:41:23PM -0700, Paul E. McKenney wrote:
> On Mon, May 14, 2012 at 03:12:50PM -0700, Tejun Heo wrote:
> > >From 544ecf310f0e7f51fa057ac2a295fc1b3b35a9d3 Mon Sep 17 00:00:00 2001
> > From: Tejun Heo <tj@xxxxxxxxxx>
> > Date: Mon, 14 May 2012 15:04:50 -0700
> >
> > worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
> > isn't zero when every worker is idle. This can trigger spuriously
> > while a cpu is going down due to the way trustee sets %WORKER_ROGUE
> > and zaps nr_running.
> >
> > It first sets %WORKER_ROGUE on all workers without updating
> > nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
> > then zaps nr_running. If the last running worker enters idle
> > inbetween, it would see stale nr_running which hasn't been zapped yet
> > and trigger the WARN_ON_ONCE().
> >
> > Fix it by performing the sanity check iff the trustee is idle.
> >
> > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> > Reported-by: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
> > ---
> > Sorry about the delay. After scratching my head quite a bit, I found
> > where during cpu-offlining such discrepancy may happen. I'm fairly
> > sure this is it but I might be wrong, so please include this patch in
> > your test setup and let me know how it goes.
>
> Thank you -- I have applied it, and will let you know how it goes.

Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/