Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()

From: Dave Chinner
Date: Thu Sep 07 2023 - 19:00:17 EST

Next message: syzbot: "[syzbot] [gfs2?] memory leak in gfs2_trans_begin"
Previous message: kernel test robot: "Re: [PATCH 3/3] scsi: ufs: core: Add exception event handler to implicitly handle WB buffer resize event"
In reply to: Peter Zijlstra: "Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()"
Next in thread: Matthew Wilcox: "Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Sep 07, 2023 at 09:38:38PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > > Several places want to know whether the lock is held by a writer, instead
> > > > of just whether it's held. We can implement this for both normal and
> > > > rt rwsems. RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> > > > it outside that file might tempt other people to use it, so just use
> > > > a comment to note that's what the 1 means, and help anybody find it if
> > > > they're looking to change the implementation.
> > >
> > > I'm presuming this is deep in a callchain where they know they hold the
> > > lock, but they lost in what capacity?
> >
> > No, it's just assertions. You can see that in patch 3 where it's
> > used in functions called things like "xfs_islocked".
>
> Right, but if you're not the lock owner, your answer to the question is
> a dice-roll, it might be locked, it might not be.

Except that the person writing the code knows the call chain that
leads up to that code, and so they have a pretty good idea whether
the object should be locked or not. If we are running that code, and
the object is locked, then it's pretty much guaranteed that the
owner of the lock is code that executed the check, because otherwise
we have a *major lock implementation bug*.

i.e. if we get to a place where rwsem_is_write_locked() fires
because some other task holds the lock, it almost always means we
have *two* tasks holding the lock exclusively.

Yes, it's these non-lockdep checks in XFS that have found rwsem
implementation bugs in the past. We've had them fire when the lock
was write locked when we know a few lines earlier it was taken as a
read lock, or marked write locked when they should have been
unlocked, etc because the rwsem code failed to enforce rw exclusion
semantics correctly.

So, really, these lock checks should be considered in the context of
the code that is running them and what such a "false detection"
would actually mean. In my experience, a false detection like you
talk about above means "rwsems are broken", not that there is a
problem with the code using the rwsems or the rwsem state check.

> > > In general I strongly dislike the whole _is_locked family, because it
> > > gives very poorly defined semantics if used by anybody but the owner.
> > >
> > > If these new functions are indeed to be used only by lock holders to
> > > determine what kind of lock they hold, could we please put:
> > >
> > > lockdep_assert_held()
> > >
> > > in them?
> >
> > Patch 2 shows it in use in the MM code. We already have a
> > lockdep_assert_held_write(), but most people don't enable lockdep, so
>
> Most devs should run with lockdep on when writing new code, and I know
> the sanitizer robots run with lockdep on.
>
> In general there seems to be a ton of lockdep on coverage.

*cough*

Bit locks, semaphores, and all sorts of other constructs for IO
serialisation (like inode_dio_wait()) have no lockdep coverage at
all. IOWs, large chunks of many filesystems, the VFS and the VM have
little to no lockdep coverage at all.

> > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> > to give us a good assertion when lockdep is disabled.
>
> Is that really worth it still? I mean, much of these assertions pre-date
> lockdep.

And we're trying to propagate them because lockdep isn't a viable
option for day to day testing of filesystems because of it's
overhead vs how infrequently it finds new problems.

> > XFS has a problem with using lockdep in general, which is that a worker
> > thread can be spawned and use the fact that the spawner is holding the
> > lock. There's no mechanism for the worker thread to ask "Does struct
> > task_struct *p hold the lock?".
>
> Will be somewhat tricky to make happen -- but might be doable. It is
> however an interface that is *very* hard to use correctly. Basically I
> think you want to also assert that your target task 'p' is blocked,
> right?
>
> That is: assert @p is blocked and holds @lock.

That addresses the immediate symptom; it doesn't address the large
problem with lockdep and needing non-owner rwsem semantics.

i.e. synchronous task based locking models don't work for
asynchronous multi-stage pipeline processing engines like XFS. The
lock protects the data object and follows the data object through
the processing pipeline, whilst the original submitter moves on to
the next operation to processes without blocking.

This is the non-blocking, async processing model that io_uring
development is pushing filesystems towards, so assuming that we only
hand a lock to a single worker task and then wait for it complete
(i.e. synchronous operation) flies in the face of current
development directions...

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

Next message: syzbot: "[syzbot] [gfs2?] memory leak in gfs2_trans_begin"
Previous message: kernel test robot: "Re: [PATCH 3/3] scsi: ufs: core: Add exception event handler to implicitly handle WB buffer resize event"
In reply to: Peter Zijlstra: "Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()"
Next in thread: Matthew Wilcox: "Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]