Re: [PATCH] locking: Document that mutex_unlock() is non-atomic

From: Jann Horn
Date: Fri Dec 01 2023 - 10:58:48 EST


On Fri, Dec 1, 2023 at 10:10 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, Nov 30, 2023 at 09:48:17PM +0100, Jann Horn wrote:
> > I have seen several cases of attempts to use mutex_unlock() to release an
> > object such that the object can then be freed by another task.
> > My understanding is that this is not safe because mutex_unlock(), in the
> > MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex
> > structure after having marked it as unlocked; so mutex_unlock() requires
> > its caller to ensure that the mutex stays alive until mutex_unlock()
> > returns.
> >
> > If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters
> > have to keep the mutex alive, I think; but we could have a spurious
> > MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed
> > between the points where __mutex_unlock_slowpath() did the cmpxchg
> > reading the flags and where it acquired the wait_lock.
> >
> > (With spinlocks, that kind of code pattern is allowed and, from what I
> > remember, used in several places in the kernel.)
> >
> > If my understanding of this is correct, we should probably document this -
> > I think such a semantic difference between mutexes and spinlocks is fairly
> > unintuitive.
>
> IIRC this is true of all sleeping locks, and I think completion was the
> explcicit exception here, but it's been a while.

In addition to completions, I think this also applies to up()? But I
don't know if that's intentionally supported or just an implementation
detail.

Is there some central place where this should be documented instead of
Documentation/locking/mutex-design.rst as a more general kernel
locking design thing? Maybe Documentation/locking/locktypes.rst?

I think it should also be documented on top of the relevant locking
function(s) though, since I don't think everyone who uses locking
functions necessarily reads the separate documentation files first.
Mutexes kind of stand out as the most common locking type, but I guess
to be consistent, we'd have to put the same comment on functions like
up_read() and up_write()? And maybe drop the "Mutexes are different
from spinlocks in this aspect" part?

(Sidenote: Someone pointed out to me that an additional source of
confusion could be that userspace POSIX mutexes support this usage
pattern.)

> > index 78540cd7f54b..087716bfa7b2 100644
> > --- a/Documentation/locking/mutex-design.rst
> > +++ b/Documentation/locking/mutex-design.rst
> > @@ -101,6 +101,12 @@ features that make lock debugging easier and faster:
> > - Detects multi-task circular deadlocks and prints out all affected
> > locks and tasks (and only those tasks).
> >
> > +Releasing a mutex is not an atomic operation: Once a mutex release operation
>
> Well, it very much is an atomic store-release. That is, I object to your
> confusing use of atomic here :-)

I'd say it involves an atomic store-release, but the whole operation
is not atomic. :P

But yeah, I see how this is confusing wording, and I'm not
particularly attached to my specific choice of words.