Re: [PATCH] locking/lockdep: Add debug_locks check in __lock_downgrade()

From: Dmitry Vyukov
Date: Mon Jan 14 2019 - 08:39:39 EST


On Mon, Jan 14, 2019 at 2:37 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jan 10, 2019 at 11:21:13AM +0100, Dmitry Vyukov wrote:
> > On Thu, Jan 10, 2019 at 5:04 AM Waiman Long <longman@xxxxxxxxxx> wrote:
> > >
> > > Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
> > > warning right after a previous lockdep warning. It is likely that the
> > > previous warning turned off lock debugging causing the lockdep to have
> > > inconsistency states leading to the lock downgrade warning.
> > >
> > > Fix that by add a check for debug_locks at the beginning of
> > > __lock_downgrade().
> > >
> > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> > > Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> >
> > Please also add:
> >
> > Reported-by: syzbot+53383ae265fb161ef488@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > for tracking purposes. But Tetsuo deserves lots of credit for debugging it.
>
> I made that:
>
> Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Debugged-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Reported-by: syzbot+53383ae265fb161ef488@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> > > index 9593233..e805fe3 100644
> > > --- a/kernel/locking/lockdep.c
> > > +++ b/kernel/locking/lockdep.c
> > > @@ -3535,6 +3535,9 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip)
> > > unsigned int depth;
> > > int i;
> > >
> > > + if (unlikely(!debug_locks))
> > > + return 0;
> > > +
> >
> > Are we sure this resolves the problem rather than makes the
> > inconsistency window smaller?
> > I don't understand all surrounding code, but looking just at this
> > function it looks like it may just pepper over the problem. Say, we
> > pass this check when lockdep was still turned on. Then this thread is
> > preempted for some time (e.g. a virtual CPU), then another thread
> > started reporting a warning, turned lockdep off, some information
> > wasn't collected, and this this task resumes and reports a false
> > warning.
>
> Theoretically possible I suppose; but this is analogous to many of the
> other lockdep hooks.
>
> > Or we are holding the mutex here, and the fact that we are holding it
> > ensures that no other task will take it and no information will be
> > lost?
>
> There is no lock here; for performance reasons we prefer not to acquire
> a global spinlock on every lockdep hook, that would be horrific.

I mean the user mutex itself. Some invariants may hold while we are
holding it as Tetsuo noted.