Re: [RT WARNING] DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current) with fsfreeze (4.19.25-rt16)

From: Peter Zijlstra
Date: Tue Apr 30 2019 - 10:02:10 EST


On Tue, Apr 30, 2019 at 03:45:48PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-04-30 15:28:11 [+0200], Peter Zijlstra wrote:
> > On Tue, Apr 30, 2019 at 02:51:31PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2019-04-19 10:56:27 [+0200], Juri Lelli wrote:
> > > > On 26/03/19 10:34, Juri Lelli wrote:
> > > > > Hi,
> > > > >
> > > > > Running this reproducer on a 4.19.25-rt16 kernel (with lock debugging
> > > > > turned on) produces warning below.
> > > >
> > > > And I now think this might lead to an actual crash.
> > >
> > > Peter, could you please take a look at the thread:
> > > https://lkml.kernel.org/r/20190419085627.GI4742@xxxxxxxxxxxxxxxxxxxxx
> > >
> > > I assumed that returning to userland with acquired locks is something we
> > > did not wantâ
> >
> > Yeah, but AFAIK fs freezing code has a history of doing exactly that..
> > This is just the latest incarnation here.
> >
> > So the immediate problem here is that the task doing thaw isn't the same
> > that did freeze, right? The thing is, I'm not seeing how that isn't a
> > problem with upstream either.
> >
> > The freeze code seems to do: percpu_down_write() for the various states,
> > and then frobs lockdep state.
> >
> > Thaw then does the reverse, frobs lockdep and then does: percpu_up_write().
> >
> > percpu_down_write() directly relies on down_write(), and
> > percpu_up_write() on up_write(). And note how __up_write() has:
> >
> > DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem);
> >
> > So why isn't this same code coming unstuck in mainline?
>
> I have to re-route most of this questions to Juri Lelli.
> Lockdep has these gems:
> lockdep_sb_freeze_release() / lockdep_sb_freeze_acquire()

Yeah, saw those, but irrespective of them, the rwsem code (not
percpu_rwsem) should complain about freeze and thaw not being the same
process.

Anyway; it's Oleg and Jan who put this together. I simply don't see how
upstream is correct here.