Re: [2.6.20] BUG: workqueue leaked lock

From: Neil Brown
Date: Mon Mar 19 2007 - 02:25:08 EST


On Friday March 16, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
>
> OK. That's not necessarily a bug: one could envisage a (weird) piece of
> code which takes a lock then releases it on a later workqueue invokation.
> But I'm not sure that nfs4_laundromat() is actually supposed to be doing
> anything like that.
>
> Then again, maybe it is: it seems to be waddling through a directory under
> the control of a little state machine, with timeouts.
>
> Neil: help?

I'm quite certain that laundromat_main does *not* leave client_mutex
locked as the last thing it does is call nfs4_unlock_state which is
mutex_unlock(&client_mutex);
To me, that raises some doubt about whether the lock leak check is
working properly...
It is somewhat harder to track locking of i_mutex, but it seems to me
that every time it is taken, it is released again shortly afterwards.

So I think this must be a problem with leak detection, not with NFSd.

NeilBrown


> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
>
> > On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote:
> > > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@xxxxxxxxxxxxxx> wrote:
> > > > ...
> > > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd]
> > > > [ 1756.728392] 2 locks held by nfsd4/3577:
> > > > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14
> > > > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18
> > > > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170
> > > > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165
> > > > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4
> > > > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > > > [ 1756.729547] =======================
> > > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd)
> > > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577
> > > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd]
> > > > [ 1846.683832] 2 locks held by nfsd4/3577:
> > > > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa
> > > > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30
> > > > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14
> > > > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18
> > > > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170
> > > > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165
> > > > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4
> > > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10
> > >
> > > Oleg, that's a fairly incomprehensible message we have in there. Can you
> > > please explain what it means?
> >
> > I think I'm responsible for this message (commit
> > d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the
> > function executed by the workqueue (here laundromat_main) leaked an
> > atomic context or is still holding locks (2 locks in this case).
> >
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/