Re: INFO: possible circular locking dependency atcleanup_workqueue_thread

From: Peter Zijlstra
Date: Tue May 19 2009 - 05:13:28 EST


On Tue, 2009-05-19 at 00:14 +0200, Oleg Nesterov wrote:
> On 05/18, Peter Zijlstra wrote:
> >
> > On Mon, 2009-05-18 at 22:16 +0200, Oleg Nesterov wrote:
> > > On 05/18, Peter Zijlstra wrote:
> > > >
> > > > On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> > > > >
> > > > > This output looks obviously wrong, Z does not depend on L1 or any
> > > > > other lock.
> > > >
> > > > It does, L1 -> L2 -> Z as per 1 and 2
> > > > which 3 obviously reverses.
> > >
> > > Yes, yes, I see. And, as I said, I can't explain what I mean.
> > >
> > > I mean... The output above looks as if we take L1 and Z in wrong order.
> > > But Z has nothing to do with this deadlock, it can't depend on any lock
> > > from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.
> >
> > AB-BC-CA deadlock
> >
> > Thread 1 Thread 2 Thread 3
> >
> > L(L1)
> > L(L2)
> > L(Z)
> > L(L2)
> > L(Z)
> > L(L1)
>
> Sure. Now Z really depends on L1. But if you change Thread 3 to take yet
> another unique lock X under Z, then lockdep will complain that X depends
> on L1, not Z.
>
> To clarify, I do not say the output is bugggy. I only meant it could be
> better. But I don't understand how to improve it.
>
> If we return to the original bug report, perhaps cpu_add_remove_lock
> has nothing to do with this problem... we could have the similar output
> if device_pm_lock() is called from work_struct.
>
> > And you're saying, we can't have that deadlock because we don't have the
> > 3 separate functions?
>
> No,
>
> > That is, there is no concurrency on Z because its always taken under L2?
>
> Yes, nobody else can hold Z when we take L2.
>
> But this wasn't my point.
>
> > For those situations we have the spin_lock_nest_lock(X, y) annotation,
> > where we say, there cannot be any concurrency on x element of X, because
> > all such locks are always taken under y.
>
> We can just kill L(Z) instead of annotating, this changes nothing from
> the correctness pov, we have the same deadlock. But the output becomes
> very clear: L1 depends on L2.
>
>
> OK, please forget. Not sure why I started this thread. Just because I
> was surprised a bit when I figured out that lockdep's output does not
> match my naive expectations.

Well, since you're quite versed in the field, I'm guessing other people
might find it even more confusing -- so it might well do to explore this
situation a bit further, if only to see if we can make lockdep output
'easier'.

There is a solution to this, Gautham suggested it a while back, we could
make lockdep scan a lock (Z) his dependencies and if in every chain a
particular other lock (L2) was taken, ignore this lock (Z) his
dependency for the circular analysis at hand.

That would mean we would not find the Z->L1 dep to violate the existing
one, because we would ignore L2->Z (because in every Z we hold L2), and
we would indeed fail on the next: L2->L1 on the next line in your
initial program.

Implementing this however might be slightly less trivial than this
explanation -- it would however rid us of the spin_lock_nest_lock()
annotation's need.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/