Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff

From: Peter Zijlstra
Date: Tue Jan 24 2023 - 07:59:24 EST


On Mon, Jan 23, 2023 at 05:07:08PM -0500, Waiman Long wrote:
> On 1/23/23 12:30, Waiman Long wrote:
> > I will update the patch description to highlight the points that I
> > discussed in this email.
>
> I am planning to update the patch description to as follows:
>
>     The lock handoff provided in rwsem isn't a true handoff like that in
>     the mutex. Instead, it is more like a quiescent state where optimistic
>     spinning and lock stealing are disabled to make it easier for the first
>     waiter to acquire the lock.
>
>     For mutex, lock handoff is done at unlock time as the owner value and
>     the handoff bit is in the same lock word and can be updated atomically.
>
>     That is the not case for rwsem which has a separate count value for
>     locking and an owner value. The only way to update them in a
> quasi-atomic
>     way is to use the wait_lock for synchronization as the handoff bit can
>     only be updated while holding the wait_lock. So for rwsem, the new
>     lock handoff mechanism is done mostly at rwsem_wake() time when the
>     wait_lock has to be acquired anyway to minimize additional overhead.

So for first==reader, sure, and you don't need anything special, since
rwsem_mark_wake() already does the right thing.

But for first==writer, I don't follow; *WHY* do you have to complicate
this path so. The write_slowpath already takes wait_lock for
rwsem_try_write_lock() and that already knows about handoff.

>     It is also likely that the active lock in this case may be a transient
>     RWSEM_READER_BIAS that will be removed soon. So we have a secondary
>     handoff done at reader slow path to handle this particular case.

Only because you made it so damn complicated. If instead you rely on the
wait_lock in write_slowpath you can keep it all in once place AFAICT.

>     For reader-owned rwsem, the owner value other than the
> RWSEM_READER_OWNED
>     bit is mostly for debugging purpose only. So it is not safe to use
>     the owner value to confirm a handoff to a reader has happened. On the

What ?!? Where do we care about the owner value? There's
RWSEM_FLAG_HANDOFF which lives in sem->count and there's
waiter->handoff_set. Nowhere do we care about sem->owner in this.

>     other hand, we can do that when handing off to a writer. However, it
>     is simpler to use the same mechanism to notify a handoff has happened
>     for both readers and writers. So a new HANDOFF_GRANTED state is added

I really can't follow whatever logic jump here.

>     to enum rwsem_handoff_state to signify that. This new value will be
>     written to the handoff_state value of the first waiter.
>
>     With true lock handoff, there is no need to do a NULL owner spinning
>     anymore as wakeup will be performed if handoff is successful. So it
>     is likely that the first waiter won't actually go to sleep even when
>     schedule() is called in this case.

So this spinning, this is purely for writer->write handoff (which is
exceedingly rare since it is readers that set handoff), right?

Why is that so important?

Also, why can't we add something like

owner = rwsem_owner_flags(sem, &flags);
if (owner && !(flags & RWSEM_READER_OWNED))
atomic_long_cond_read_relaxed(&sem->counter, !(VAL & RWSEM_WRITER_LOCKED))

to the start of that? If it's really needed.